SCAF-Net: Scene Context Attention-Based Fusion Network for Vehicle Detection in Aerial Imagery

Minghui Wang, Qingpeng Li, Yunchao Gu, Leyuan Fang, Xiao Xiang Zhu

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

In recent years, deep learning methods have achieved great success for vehicle detection tasks in aerial imagery. However, most existing methods focus only on extracting latent vehicle target features, and rarely consider the scene context as vital prior knowledge. In this letter, we propose a scene context attention-based fusion network (SCAF-Net), to fuse the scene context of vehicles into an end-to-end vehicle detection network. First, we propose a novel strategy, patch cover, to keep the original target and scene context information in raw aerial images of a large scale as much as possible. Next, we use an improved YOLO-v3 network as one branch of SCAF-Net, to generate vehicle candidates on each patch. Here, a novel branch for the scene context is utilized to extract the latent scene context of vehicles on each patch without any extra annotations. Then, these two branches above are concatenated together as a fusion network, and we apply an attention-based model to further extract vehicle candidates of each local scene. Finally, all vehicle candidates of different patches, are merged by global nonmax suppress (g-NMS) to output the detection result of the whole original image. Experimental results demonstrate that our proposed method outperforms the comparison methods with both high detection accuracy and speed. Our code is released at https://github.com/minghuicode/SCAF-Net.

Original languageEnglish
JournalIEEE Geoscience and Remote Sensing Letters
Volume19
DOIs
StatePublished - 2022
Externally publishedYes

Keywords

  • Attention-based model
  • deep learning
  • fusion network
  • remote sensing
  • vehicle detection

Fingerprint

Dive into the research topics of 'SCAF-Net: Scene Context Attention-Based Fusion Network for Vehicle Detection in Aerial Imagery'. Together they form a unique fingerprint.

Cite this