Object-RPE: Dense 3D reconstruction and pose estimation with convolutional neural networks

Dinh Cuong Hoang, Achim J. Lilienthal, Todor Stoyanov

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


We present an approach for recognizing objects present in a scene and estimating their full pose by means of an accurate 3D instance-aware semantic reconstruction. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion (Whelan et al., 2016), to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. We leverage the pipeline of ElasticFusion as a backbone, and propose a joint geometric and photometric error function with per-pixel adaptive weights. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality instance-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through extensive experiments on different datasets. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.

Original languageEnglish
Article number103632
JournalRobotics and Autonomous Systems
StatePublished - Nov 2020
Externally publishedYes


  • 3D reconstruction
  • 3D registration
  • Object pose estimation
  • Semantic mapping


Dive into the research topics of 'Object-RPE: Dense 3D reconstruction and pose estimation with convolutional neural networks'. Together they form a unique fingerprint.

Cite this