TY - JOUR
T1 - Panoptic 3D mapping and object pose estimation using adaptively weighted semantic information
AU - Hoang, Dinh Cuong
AU - Lilienthal, Achim J.
AU - Stoyanov, Todor
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - We present a system capable of reconstructing highly detailed object-level models and estimating the 6D pose of objects by means of an RGB-D camera. In this work, we integrate deep-learning-based semantic segmentation, instance segmentation, and 6D object pose estimation into a state of the art RGB-D mapping system. We leverage the pipeline of ElasticFusion as a backbone and propose modifications of the registration cost function to make full use of the semantic class labels in the process. The proposed objective function features tunable weights for the depth, appearance, and semantic information channels, which are learned from data. A fast semantic segmentation and registration weight prediction convolutional neural network (Fast-RGBD-SSWP) suited to efficient computation is introduced. In addition, our approach explores performing 6D object pose estimation from multiple viewpoints supported by the high-quality reconstruction system. The developed method has been verified through experimental validation on the YCB-Video dataset and a dataset of warehouse objects. Our results confirm that the proposed system performs favorably in terms of surface reconstruction, segmentation quality, and accurate object pose estimation in comparison to other state-of-the-art systems. Our code and video are available at https://sites.google.com/view/panoptic-mope.
AB - We present a system capable of reconstructing highly detailed object-level models and estimating the 6D pose of objects by means of an RGB-D camera. In this work, we integrate deep-learning-based semantic segmentation, instance segmentation, and 6D object pose estimation into a state of the art RGB-D mapping system. We leverage the pipeline of ElasticFusion as a backbone and propose modifications of the registration cost function to make full use of the semantic class labels in the process. The proposed objective function features tunable weights for the depth, appearance, and semantic information channels, which are learned from data. A fast semantic segmentation and registration weight prediction convolutional neural network (Fast-RGBD-SSWP) suited to efficient computation is introduced. In addition, our approach explores performing 6D object pose estimation from multiple viewpoints supported by the high-quality reconstruction system. The developed method has been verified through experimental validation on the YCB-Video dataset and a dataset of warehouse objects. Our results confirm that the proposed system performs favorably in terms of surface reconstruction, segmentation quality, and accurate object pose estimation in comparison to other state-of-the-art systems. Our code and video are available at https://sites.google.com/view/panoptic-mope.
KW - RGB-D perception
KW - mapping
KW - object detection
KW - segmen-tation and categorization
UR - http://www.scopus.com/inward/record.url?scp=85079819725&partnerID=8YFLogxK
U2 - 10.1109/LRA.2020.2970682
DO - 10.1109/LRA.2020.2970682
M3 - Article
AN - SCOPUS:85079819725
SN - 2377-3766
VL - 5
SP - 1962
EP - 1969
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 2
M1 - 8977356
ER -