TY - GEN
T1 - Improving Multimodal Object Detection with Individual Sensor Monitoring
AU - Kuhn, Christopher B.
AU - Hofbauer, Markus
AU - Bowen, Ma
AU - Petrovic, Goran
AU - Steinbach, Eckehard
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Multimodal object detection fuses different sensors such as camera or LIDAR to improve the detection performance. However, individual sensor inputs can also be detrimental to a system, for example when sun glare hits a camera. In this work, we propose to monitor each sensor individually to predict when an input would lead to incorrect detections. We first train one detection network for each sensor separately, using only that sensor as input. Then, we record the performance for each single-sensor network and train an introspective performance prediction network for each sensor. Finally, we train a multimodal fusion network where we weight the impact of each sensor with its predicted performance. This allows us to dynamically adapt the fusion to reduce the influence of harmful sensor readings based only on the current data. We apply the proposed concept to the state-of-the-art AVOD architecture and evaluate on the KITTI data set. The proposed sensor monitoring system improves the mean intersection-over-union performance by 4.6%. For inputs with a low predicted performance, the proposed approach outperforms the state of the art by over 10%, demonstrating the potential of using individual sensor monitoring to react to problematic input. The proposed approach can be applied to any fusion network with two or more sensors and could also be used for classification or segmentation tasks.
AB - Multimodal object detection fuses different sensors such as camera or LIDAR to improve the detection performance. However, individual sensor inputs can also be detrimental to a system, for example when sun glare hits a camera. In this work, we propose to monitor each sensor individually to predict when an input would lead to incorrect detections. We first train one detection network for each sensor separately, using only that sensor as input. Then, we record the performance for each single-sensor network and train an introspective performance prediction network for each sensor. Finally, we train a multimodal fusion network where we weight the impact of each sensor with its predicted performance. This allows us to dynamically adapt the fusion to reduce the influence of harmful sensor readings based only on the current data. We apply the proposed concept to the state-of-the-art AVOD architecture and evaluate on the KITTI data set. The proposed sensor monitoring system improves the mean intersection-over-union performance by 4.6%. For inputs with a low predicted performance, the proposed approach outperforms the state of the art by over 10%, demonstrating the potential of using individual sensor monitoring to react to problematic input. The proposed approach can be applied to any fusion network with two or more sensors and could also be used for classification or segmentation tasks.
KW - Introspection
KW - Object Detection
KW - Sensor Fusion
UR - http://www.scopus.com/inward/record.url?scp=85147552079&partnerID=8YFLogxK
U2 - 10.1109/ISM55400.2022.00022
DO - 10.1109/ISM55400.2022.00022
M3 - Conference contribution
AN - SCOPUS:85147552079
T3 - Proceedings - 2022 IEEE International Symposium on Multimedia, ISM 2022
SP - 97
EP - 104
BT - Proceedings - 2022 IEEE International Symposium on Multimedia, ISM 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th IEEE International Symposium on Multimedia, ISM 2022
Y2 - 5 December 2022 through 7 December 2022
ER -