Abstract
Abstract: The fine-grained visual recognition is to classify several sub-categories affiliated to the same basic-level category, which is highly challenging because the same sub-category with large variance and different sub-categories with small variance. Previously approaches generally localize the targets or parts first, then determine which sub-category the image is attached to. They depend on target or part annotations, which are labor-intensive and a barrier to moving towards practical use. Other methods indirectly extract recognizable areas from the high-level feature maps, ignoring the spatial relationships between the target and its parts, which may cause inaccurate recognition. In this paper, we propose a weakly supervised spatial group attention network (WSSGA-Net) for fine-grained bird recognition. According to the spatial relationships between the target and its parts, we embed the spatial group attention (SGA) module into the WSSGA-Net to highlight the correct semantic feature regions by establishing a semantic feature space enhancement mechanism. In addition, we apply moment exchange (MoEx) to generate new feature maps by exchanging two input image feature moments for data augmentation. Comprehensive experiments indicate that our approach significantly has a better performance than the state-of-the-art approaches on the standard bird image datasets Bird-65, CUB200-2011 and fine-grained dataset Stanford Cars. Graphical abstract: [Figure not available: see fulltext.].
Original language | English |
---|---|
Pages (from-to) | 23301-23315 |
Number of pages | 15 |
Journal | Applied Intelligence |
Volume | 53 |
Issue number | 20 |
DOIs | |
State | Published - Oct 2023 |
Externally published | Yes |
Keywords
- Bird recognition
- Classification
- Fine-grained image
- Moment exchange
- Spatial group attention
- Weakly supervised network