A weakly supervised spatial group attention network for fine-grained visual recognition

Jiangjian Xie, Yujie Zhong, Junguo Zhang, Changchun Zhang, Björn W. Schuller

Research output: Contribution to journalArticlepeer-review

Abstract

Abstract: The fine-grained visual recognition is to classify several sub-categories affiliated to the same basic-level category, which is highly challenging because the same sub-category with large variance and different sub-categories with small variance. Previously approaches generally localize the targets or parts first, then determine which sub-category the image is attached to. They depend on target or part annotations, which are labor-intensive and a barrier to moving towards practical use. Other methods indirectly extract recognizable areas from the high-level feature maps, ignoring the spatial relationships between the target and its parts, which may cause inaccurate recognition. In this paper, we propose a weakly supervised spatial group attention network (WSSGA-Net) for fine-grained bird recognition. According to the spatial relationships between the target and its parts, we embed the spatial group attention (SGA) module into the WSSGA-Net to highlight the correct semantic feature regions by establishing a semantic feature space enhancement mechanism. In addition, we apply moment exchange (MoEx) to generate new feature maps by exchanging two input image feature moments for data augmentation. Comprehensive experiments indicate that our approach significantly has a better performance than the state-of-the-art approaches on the standard bird image datasets Bird-65, CUB200-2011 and fine-grained dataset Stanford Cars. Graphical abstract: [Figure not available: see fulltext.].

Original languageEnglish
Pages (from-to)23301-23315
Number of pages15
JournalApplied Intelligence
Volume53
Issue number20
DOIs
StatePublished - Oct 2023
Externally publishedYes

Keywords

  • Bird recognition
  • Classification
  • Fine-grained image
  • Moment exchange
  • Spatial group attention
  • Weakly supervised network

Fingerprint

Dive into the research topics of 'A weakly supervised spatial group attention network for fine-grained visual recognition'. Together they form a unique fingerprint.

Cite this