ERA: A Data Set and Deep Learning Benchmark for Event Recognition in Aerial Videos [Software and Data Sets]

Lichao Mou, Yuansheng Hua, Pu Jin, Xiao Xiang Zhu

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


As a result of the increasing use of unmanned aerial vehicles (UAVs), large volumes of aerial videos have been produced. It is unrealistic for humans to screen such big data and understand the contents. Hence, methodological research on the automatic understanding of UAV videos is of paramount importance (Figure 1). In this article, we introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community and present the large-scale, human-annotated Event Recognition in Aerial Videos (ERA) data set, consisting of 2,864 videos, each with a label from 25 different classes corresponding to an event unfolding for five seconds. All these videos are collected from YouTube. The ERA data set is designed to have significant intraclass variation and interclass similarity and captures dynamic events in various circumstances and at dramatically various scales. Moreover, to offer a benchmark for this task, we extensively validate existing deep networks. We expect that the ERA data set will facilitate further progress in automatic aerial video comprehension. The data set and trained models can be downloaded from

Original languageEnglish
Article number9295448
Pages (from-to)125-133
Number of pages9
JournalIEEE Geoscience and Remote Sensing Magazine
Issue number4
StatePublished - Dec 2020
Externally publishedYes


Dive into the research topics of 'ERA: A Data Set and Deep Learning Benchmark for Event Recognition in Aerial Videos [Software and Data Sets]'. Together they form a unique fingerprint.

Cite this