TY - JOUR
T1 - Cross-corpus open set bird species recognition by vocalization
AU - Xie, Jiangjian
AU - Zhang, Luyang
AU - Zhang, Junguo
AU - Zhang, Yanyun
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2023/10
Y1 - 2023/10
N2 - In the wild, bird vocalizations of the same species across different populations may be different (e. g., so called dialect). Besides, the number of species is unknown in advance. These two facts make the task of bird species recognition based on vocalization a challenging one. This study treats this task as an open set recognition (OSR) cross-corpus scenario. We propose Instance Frequency Normalization (IFN) to remove instance-specific differences across different corpora. Furthermore, an x-vector feature extraction model integrated Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) are designed to better capture sequence information. Finally, the threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discriminate the extracted x-vector features to discover the unknown classes. When compared to the best results of the existing method, the average ACCs for the single-corpus and cross-corpus experiments are improved, implying that our method can provide a potential solution and improve performance for cross-corpus bird species recognition based on vocalization in open set condition.
AB - In the wild, bird vocalizations of the same species across different populations may be different (e. g., so called dialect). Besides, the number of species is unknown in advance. These two facts make the task of bird species recognition based on vocalization a challenging one. This study treats this task as an open set recognition (OSR) cross-corpus scenario. We propose Instance Frequency Normalization (IFN) to remove instance-specific differences across different corpora. Furthermore, an x-vector feature extraction model integrated Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) are designed to better capture sequence information. Finally, the threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discriminate the extracted x-vector features to discover the unknown classes. When compared to the best results of the existing method, the average ACCs for the single-corpus and cross-corpus experiments are improved, implying that our method can provide a potential solution and improve performance for cross-corpus bird species recognition based on vocalization in open set condition.
KW - Bird species recognition
KW - Cross-corpus recognition
KW - Instance frequency normalization
KW - Open set
KW - Vocalization
UR - http://www.scopus.com/inward/record.url?scp=85168804899&partnerID=8YFLogxK
U2 - 10.1016/j.ecolind.2023.110826
DO - 10.1016/j.ecolind.2023.110826
M3 - Article
AN - SCOPUS:85168804899
SN - 1470-160X
VL - 154
JO - Ecological Indicators
JF - Ecological Indicators
M1 - 110826
ER -