TY - JOUR
T1 - Object-RPE
T2 - Dense 3D reconstruction and pose estimation with convolutional neural networks
AU - Hoang, Dinh Cuong
AU - Lilienthal, Achim J.
AU - Stoyanov, Todor
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/11
Y1 - 2020/11
N2 - We present an approach for recognizing objects present in a scene and estimating their full pose by means of an accurate 3D instance-aware semantic reconstruction. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion (Whelan et al., 2016), to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. We leverage the pipeline of ElasticFusion as a backbone, and propose a joint geometric and photometric error function with per-pixel adaptive weights. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality instance-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through extensive experiments on different datasets. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.
AB - We present an approach for recognizing objects present in a scene and estimating their full pose by means of an accurate 3D instance-aware semantic reconstruction. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion (Whelan et al., 2016), to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. We leverage the pipeline of ElasticFusion as a backbone, and propose a joint geometric and photometric error function with per-pixel adaptive weights. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality instance-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through extensive experiments on different datasets. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.
KW - 3D reconstruction
KW - 3D registration
KW - Object pose estimation
KW - Semantic mapping
UR - http://www.scopus.com/inward/record.url?scp=85090016097&partnerID=8YFLogxK
U2 - 10.1016/j.robot.2020.103632
DO - 10.1016/j.robot.2020.103632
M3 - Article
AN - SCOPUS:85090016097
SN - 0921-8890
VL - 133
JO - Robotics and Autonomous Systems
JF - Robotics and Autonomous Systems
M1 - 103632
ER -