CamRaDepth: Semantic Guided Depth Estimation Using Monocular Camera and Sparse Radar for Automotive Perception

Florian Sauerbeck, Dan Halperin, Lukas Connert, Johannes Betz

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Our research aims to generate robust, dense 3-D depth maps for robotics, especially autonomous driving applications. Since cameras output 2-D images and active sensors such as LiDAR or radar produce sparse depth measurements, dense depth maps need to be estimated. Recent methods based on visual transformer networks have outperformed conventional deep learning approaches in various computer vision tasks, including depth prediction, but have focused on the use of a single camera image. This article explores the potential of visual transformers applied to the fusion of monocular images, semantic segmentation, and projected sparse radar reflections for robust monocular depth estimation. The addition of a semantic segmentation branch is used to add object-level understanding and is investigated in a supervised and unsupervised manner. We evaluate our new depth estimation approach on the nuScenes dataset where it outperforms existing state-of-the-art camera-radar depth estimation methods. We show that models can benefit from an additional segmentation branch during the training process by transfer learning even without running segmentation at inference. Further studies are needed to investigate the usage of 4-D-imaging radars and enhanced ground-truth generation in more detail. The related code is available as open-source software under https://github.com/TUMFTM/CamRaDepth.

Original languageEnglish
Pages (from-to)28442-28453
Number of pages12
JournalIEEE Sensors Journal
Volume23
Issue number22
DOIs
StatePublished - 15 Nov 2023

Keywords

  • Autonomous driving
  • computer vision
  • depth prediction
  • intelligent vehicles
  • semantic segmentation
  • sensor fusion

Fingerprint

Dive into the research topics of 'CamRaDepth: Semantic Guided Depth Estimation Using Monocular Camera and Sparse Radar for Automotive Perception'. Together they form a unique fingerprint.

Cite this