TY - GEN
T1 - Exploring the Capabilities and Limits of 3D Monocular Object Detection - A Study on Simulation and Real World Data
AU - Nobis, Felix
AU - Brunhuber, Fabian
AU - Janssen, Simon
AU - Betz, Johannes
AU - Lienkamp, Markus
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/9/20
Y1 - 2020/9/20
N2 - 3D object detection based on monocular camera data is a key enabler for autonomous driving. The task however, is ill-posed due to lack of depth information in 2D images. Recent deep learning methods show promising results to recover depth information from single images by learning priors about the environment. Several competing strategies tackle this problem. In addition to the network design, the major difference of these competing approaches lies in using a supervised or self-supervised optimization loss function, which require different data and ground truth information. In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations. We implement a simple distance calculation approach based on camera intrinsics and 2D bounding box size, a self-supervised, and a supervised learning approach for depth estimation. Ground truth depth information cannot be recorded reliable in real world scenarios. This shifts our training focus to simulation data. In simulation, labeling and ground truth generation can be automatized. We evaluate the detection pipeline on simulator data and a real world sequence from an autonomous vehicle on a race track. The benefit of training on simulation data for the application of the network on real world data is investigated. Advantages and drawbacks of the different depth estimation strategies are discussed.
AB - 3D object detection based on monocular camera data is a key enabler for autonomous driving. The task however, is ill-posed due to lack of depth information in 2D images. Recent deep learning methods show promising results to recover depth information from single images by learning priors about the environment. Several competing strategies tackle this problem. In addition to the network design, the major difference of these competing approaches lies in using a supervised or self-supervised optimization loss function, which require different data and ground truth information. In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations. We implement a simple distance calculation approach based on camera intrinsics and 2D bounding box size, a self-supervised, and a supervised learning approach for depth estimation. Ground truth depth information cannot be recorded reliable in real world scenarios. This shifts our training focus to simulation data. In simulation, labeling and ground truth generation can be automatized. We evaluate the detection pipeline on simulator data and a real world sequence from an autonomous vehicle on a race track. The benefit of training on simulation data for the application of the network on real world data is investigated. Advantages and drawbacks of the different depth estimation strategies are discussed.
UR - http://www.scopus.com/inward/record.url?scp=85099644996&partnerID=8YFLogxK
U2 - 10.1109/ITSC45102.2020.9294625
DO - 10.1109/ITSC45102.2020.9294625
M3 - Conference contribution
AN - SCOPUS:85099644996
T3 - 2020 IEEE 23rd International Conference on Intelligent Transportation Systems, ITSC 2020
BT - 2020 IEEE 23rd International Conference on Intelligent Transportation Systems, ITSC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Intelligent Transportation Systems, ITSC 2020
Y2 - 20 September 2020 through 23 September 2020
ER -