TY - GEN
T1 - DeepSTEP - Deep Learning-Based Spatio-Temporal End-To-End Perception for Autonomous Vehicles
AU - Huch, Sebastian
AU - Sauerbeck, Florian
AU - Betz, Johannes
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Autonomous vehicles demand high accuracy and robustness of perception algorithms. To develop efficient and scalable perception algorithms, the maximum information should be extracted from the available sensor data. In this work, we present our concept for an end-to-end perception architecture, named DeepSTEP. The deep learning-based architecture processes raw sensor data from the camera, LiDAR, and RaDAR, and combines the extracted data in a deep fusion network. The output of this deep fusion network is a shared feature space, which is used by perception head networks to fulfill several perception tasks, such as object detection or local mapping. DeepSTEP incorporates multiple ideas to advance state of the art: First, combining detection and localization into a single pipeline allows for efficient processing to reduce computational overhead and further improves overall performance. Second, the architecture leverages the temporal domain by using a self-attention mechanism that focuses on the most important features. We believe that our concept of DeepSTEP will advance the development of end-to-end perception systems. The network will be deployed on our research vehicle, which will be used as a platform for data collection, real-world testing, and validation. In conclusion, DeepSTEP represents a significant advancement in the field of perception for autonomous vehicles. The architecture's end-to-end design, time-aware attention mechanism, and integration of multiple perception tasks make it a promising solution for real-world deployment. This research is a work in progress and presents the first concept of establishing a novel perception pipeline.
AB - Autonomous vehicles demand high accuracy and robustness of perception algorithms. To develop efficient and scalable perception algorithms, the maximum information should be extracted from the available sensor data. In this work, we present our concept for an end-to-end perception architecture, named DeepSTEP. The deep learning-based architecture processes raw sensor data from the camera, LiDAR, and RaDAR, and combines the extracted data in a deep fusion network. The output of this deep fusion network is a shared feature space, which is used by perception head networks to fulfill several perception tasks, such as object detection or local mapping. DeepSTEP incorporates multiple ideas to advance state of the art: First, combining detection and localization into a single pipeline allows for efficient processing to reduce computational overhead and further improves overall performance. Second, the architecture leverages the temporal domain by using a self-attention mechanism that focuses on the most important features. We believe that our concept of DeepSTEP will advance the development of end-to-end perception systems. The network will be deployed on our research vehicle, which will be used as a platform for data collection, real-world testing, and validation. In conclusion, DeepSTEP represents a significant advancement in the field of perception for autonomous vehicles. The architecture's end-to-end design, time-aware attention mechanism, and integration of multiple perception tasks make it a promising solution for real-world deployment. This research is a work in progress and presents the first concept of establishing a novel perception pipeline.
UR - https://www.scopus.com/pages/publications/85167965087
U2 - 10.1109/IV55152.2023.10186768
DO - 10.1109/IV55152.2023.10186768
M3 - Conference contribution
AN - SCOPUS:85167965087
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
BT - IV 2023 - IEEE Intelligent Vehicles Symposium, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE Intelligent Vehicles Symposium, IV 2023
Y2 - 4 June 2023 through 7 June 2023
ER -