TY - JOUR
T1 - FSFNet
T2 - Foreground Score-Aware Fusion for 3-D Object Detector under Unfavorable Conditions
AU - Lin, Jia
AU - Yin, Huilin
AU - Yan, Jun
AU - Jian, Kaifeng
AU - Lu, Yu
AU - Ge, Wancheng
AU - Zhang, Hao
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
© 2001-2012 IEEE.
PY - 2023/7/15
Y1 - 2023/7/15
N2 - Nowadays, various multimodal fusion-based 3-D object detectors appear to provide a potential opportunity to solve the failure cases in single-modality methods. However, current fusion approaches still face some unfavorable factors, e.g., poor illumination driving conditions and crowded traffic scenarios, which will cause unsatisfying image quality and objects' occlusion. To this end, we present a multimodal fusion network FSFNet consisting of local graph-aware point cloud backbone (LGB), foreground score-aware fusion network (FSFN), and the proposals' refining loss (PRL) for the 3-D object detection task in this article. Concretely, the directed graph is built to generate edgewise features for each point, and the point features are supplemented with graph information in LGB. To alleviate the defect of undesirable image quality features caused by poor illumination condition, FSFN is designed to produce an adaptive multimodal feature by taking pointwise foreground scores into consideration. Hence, levelwise point features with high confidence are fully used, and the imperfect image information is suppressed in fusion stage. We further introduce PRL to reduce the false positive and false negative cases in crowded scenes by optimizing the location and scores of predicted 3-D bounding boxes. Extensive experiments conducted on the KITTI benchmark demonstrate that FSFNet owns its superiority over state-of-the-art networks. Moreover, FSFN is verified to be robust against the image inputs under poor illumination conditions.
AB - Nowadays, various multimodal fusion-based 3-D object detectors appear to provide a potential opportunity to solve the failure cases in single-modality methods. However, current fusion approaches still face some unfavorable factors, e.g., poor illumination driving conditions and crowded traffic scenarios, which will cause unsatisfying image quality and objects' occlusion. To this end, we present a multimodal fusion network FSFNet consisting of local graph-aware point cloud backbone (LGB), foreground score-aware fusion network (FSFN), and the proposals' refining loss (PRL) for the 3-D object detection task in this article. Concretely, the directed graph is built to generate edgewise features for each point, and the point features are supplemented with graph information in LGB. To alleviate the defect of undesirable image quality features caused by poor illumination condition, FSFN is designed to produce an adaptive multimodal feature by taking pointwise foreground scores into consideration. Hence, levelwise point features with high confidence are fully used, and the imperfect image information is suppressed in fusion stage. We further introduce PRL to reduce the false positive and false negative cases in crowded scenes by optimizing the location and scores of predicted 3-D bounding boxes. Extensive experiments conducted on the KITTI benchmark demonstrate that FSFNet owns its superiority over state-of-the-art networks. Moreover, FSFN is verified to be robust against the image inputs under poor illumination conditions.
KW - 3-D object detection
KW - LiDAR sensor
KW - automated driving
KW - intelligent transportation systems
KW - multimodal fusion strategy
UR - http://www.scopus.com/inward/record.url?scp=85162617304&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2023.3283018
DO - 10.1109/JSEN.2023.3283018
M3 - Article
AN - SCOPUS:85162617304
SN - 1530-437X
VL - 23
SP - 15988
EP - 16001
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 14
ER -