TY - GEN
T1 - DeepFusion
T2 - 2019 International Conference on Robotics and Automation, ICRA 2019
AU - Laidlow, Tristan
AU - Czarnowski, Jan
AU - Leutenegger, Stefan
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - While the keypoint-based maps created by sparse monocular Simultaneous Localisation and Mapping (SLAM) systems are useful for camera tracking, dense 3D reconstructions may be desired for many robotic tasks. Solutions involving depth cameras are limited in range and to indoor spaces, and dense reconstruction systems based on minimising the photometric error between frames are typically poorly constrained and suffer from scale ambiguity. To address these issues, we propose a 3D reconstruction system that leverages the output of a Convolutional Neural Network (CNN) to produce fully dense depth maps for keyframes that include metric scale. Our system, DeepFusion, is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradient predictions of a CNN in a probabilistic fashion, using learned uncertainties produced by the network. While the network only needs to be run once per keyframe, we are able to optimise for the depth map with each new frame so as to constantly make use of new geometric constraints. Based on its performance on synthetic and real world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
AB - While the keypoint-based maps created by sparse monocular Simultaneous Localisation and Mapping (SLAM) systems are useful for camera tracking, dense 3D reconstructions may be desired for many robotic tasks. Solutions involving depth cameras are limited in range and to indoor spaces, and dense reconstruction systems based on minimising the photometric error between frames are typically poorly constrained and suffer from scale ambiguity. To address these issues, we propose a 3D reconstruction system that leverages the output of a Convolutional Neural Network (CNN) to produce fully dense depth maps for keyframes that include metric scale. Our system, DeepFusion, is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradient predictions of a CNN in a probabilistic fashion, using learned uncertainties produced by the network. While the network only needs to be run once per keyframe, we are able to optimise for the depth map with each new frame so as to constantly make use of new geometric constraints. Based on its performance on synthetic and real world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
UR - http://www.scopus.com/inward/record.url?scp=85071423009&partnerID=8YFLogxK
U2 - 10.1109/ICRA.2019.8793527
DO - 10.1109/ICRA.2019.8793527
M3 - Conference contribution
AN - SCOPUS:85071423009
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 4068
EP - 4074
BT - 2019 International Conference on Robotics and Automation, ICRA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 May 2019 through 24 May 2019
ER -