TY - GEN
T1 - Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection
AU - Mohamed, Sondos
AU - Zimmer, Walter
AU - Greer, Ross
AU - Ghita, Ahmed Alaaeldin
AU - Castrillón-Santana, Modesto
AU - Trivedi, Mohan
AU - Knoll, Alois
AU - Carta, Salvatore Mario
AU - Marras, Mirko
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset, when performing transfer learning. Code, data, and qualitative video results are available at https://roadsense3d.github.io.
AB - Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset, when performing transfer learning. Code, data, and qualitative video results are available at https://roadsense3d.github.io.
KW - Intelligent Transportation Systems
KW - Intelligent Vehicles
KW - Monocular 3D Object Detection
KW - Synthetic Data
KW - Transfer Learning
UR - https://www.scopus.com/pages/publications/105006880501
U2 - 10.1007/978-3-031-91813-1_20
DO - 10.1007/978-3-031-91813-1_20
M3 - Conference contribution
AN - SCOPUS:105006880501
SN - 9783031918124
T3 - Lecture Notes in Computer Science
SP - 309
EP - 325
BT - Computer Vision – ECCV 2024 Workshops, Proceedings
A2 - Del Bue, Alessio
A2 - Canton, Cristian
A2 - Pont-Tuset, Jordi
A2 - Tommasi, Tatiana
PB - Springer Science and Business Media Deutschland GmbH
T2 - Workshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024
Y2 - 29 September 2024 through 4 October 2024
ER -