Abstract
With the increasing volume of aerial videos, the demand for automatically parsing these videos is surging. To achieve this, current researches mainly focus on extracting a holistic feature with convolutions along both spatial and temporal dimensions. However, these methods are limited by small temporal receptive fields and cannot adequately capture long-term temporal dependencies which are important for describing complicated dynamics. In this paper, we propose a novel two-pathway network to model not only holistic features, but also temporal relations for aerial video classification. More specially, our model employs a two-pathway architecture: (1) a holistic representation pathway to learn a general feature of frame appearances and short-term temporal variations and (2) a temporal relation pathway to capture multi-scale temporal relations across arbitrary frames, providing long-term temporal dependencies. Our model is evaluated on event recognition dataset, ERA, and achieves the state-of-the-art results. This demonstrates its effectiveness and good generalization capacity.
Original language | English |
---|---|
Pages | 8221-8224 |
Number of pages | 4 |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2021 - Brussels, Belgium Duration: 12 Jul 2021 → 16 Jul 2021 |
Conference
Conference | 2021 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2021 |
---|---|
Country/Territory | Belgium |
City | Brussels |
Period | 12/07/21 → 16/07/21 |
Keywords
- Aerial video classification
- convolutional neural networks (CNNs)
- holistic features
- temporal relations
- two-pathway
- unmanned aerial vehicles (UAVs)