Temp-frustum net: 3D object detection with temporal fusion

Emec Ercelik, Ekim Yurtsever, Alois Knoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

3D object detection is a core component of automated driving systems. State-of-the-art methods fuse RGB imagery and LiDAR point cloud data frame-by-frame for 3D bounding box regression. However, frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity. We propose a novel Temporal Fusion Module (TFM) to use information from previous time-steps to mitigate these problems. First, a state-of-the-art frustum network extracts point cloud features from raw RGB and LiDAR point cloud data frame-by-frame. Then, our TFM module fuses these features with a recurrent neural network. As a result, 3D object detection becomes robust against single frame failures and transient occlusions. Experiments on the KITTI object tracking dataset show the efficiency of the proposed TFM, where we obtain 6%, 4%, and 6% improvements on Car, Pedestrian, and Cyclist classes, respectively, compared to frame-by-frame baselines. Furthermore, ablation studies reinforce that the subject of improvement is temporal fusion and show the effects of different placements of TFM in the object detection pipeline. Our code is open-source and available at https://github.com/emecercelik/Temp-Frustum-Net.git.

Original languageEnglish
Title of host publication32nd IEEE Intelligent Vehicles Symposium, IV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1095-1101
Number of pages7
ISBN (Electronic)9781728153940
DOIs
StatePublished - 11 Jul 2021
Event32nd IEEE Intelligent Vehicles Symposium, IV 2021 - Nagoya, Japan
Duration: 11 Jul 202117 Jul 2021

Publication series

NameIEEE Intelligent Vehicles Symposium, Proceedings
Volume2021-July

Conference

Conference32nd IEEE Intelligent Vehicles Symposium, IV 2021
Country/TerritoryJapan
CityNagoya
Period11/07/2117/07/21

Fingerprint

Dive into the research topics of 'Temp-frustum net: 3D object detection with temporal fusion'. Together they form a unique fingerprint.

Cite this