TY - JOUR
T1 - SCAF-Net
T2 - Scene Context Attention-Based Fusion Network for Vehicle Detection in Aerial Imagery
AU - Wang, Minghui
AU - Li, Qingpeng
AU - Gu, Yunchao
AU - Fang, Leyuan
AU - Zhu, Xiao Xiang
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - In recent years, deep learning methods have achieved great success for vehicle detection tasks in aerial imagery. However, most existing methods focus only on extracting latent vehicle target features, and rarely consider the scene context as vital prior knowledge. In this letter, we propose a scene context attention-based fusion network (SCAF-Net), to fuse the scene context of vehicles into an end-to-end vehicle detection network. First, we propose a novel strategy, patch cover, to keep the original target and scene context information in raw aerial images of a large scale as much as possible. Next, we use an improved YOLO-v3 network as one branch of SCAF-Net, to generate vehicle candidates on each patch. Here, a novel branch for the scene context is utilized to extract the latent scene context of vehicles on each patch without any extra annotations. Then, these two branches above are concatenated together as a fusion network, and we apply an attention-based model to further extract vehicle candidates of each local scene. Finally, all vehicle candidates of different patches, are merged by global nonmax suppress (g-NMS) to output the detection result of the whole original image. Experimental results demonstrate that our proposed method outperforms the comparison methods with both high detection accuracy and speed. Our code is released at https://github.com/minghuicode/SCAF-Net.
AB - In recent years, deep learning methods have achieved great success for vehicle detection tasks in aerial imagery. However, most existing methods focus only on extracting latent vehicle target features, and rarely consider the scene context as vital prior knowledge. In this letter, we propose a scene context attention-based fusion network (SCAF-Net), to fuse the scene context of vehicles into an end-to-end vehicle detection network. First, we propose a novel strategy, patch cover, to keep the original target and scene context information in raw aerial images of a large scale as much as possible. Next, we use an improved YOLO-v3 network as one branch of SCAF-Net, to generate vehicle candidates on each patch. Here, a novel branch for the scene context is utilized to extract the latent scene context of vehicles on each patch without any extra annotations. Then, these two branches above are concatenated together as a fusion network, and we apply an attention-based model to further extract vehicle candidates of each local scene. Finally, all vehicle candidates of different patches, are merged by global nonmax suppress (g-NMS) to output the detection result of the whole original image. Experimental results demonstrate that our proposed method outperforms the comparison methods with both high detection accuracy and speed. Our code is released at https://github.com/minghuicode/SCAF-Net.
KW - Attention-based model
KW - deep learning
KW - fusion network
KW - remote sensing
KW - vehicle detection
UR - http://www.scopus.com/inward/record.url?scp=85114713717&partnerID=8YFLogxK
U2 - 10.1109/LGRS.2021.3107281
DO - 10.1109/LGRS.2021.3107281
M3 - Article
AN - SCOPUS:85114713717
SN - 1545-598X
VL - 19
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
ER -