Improving Bird Vocalization Recognition in Open-Set Cross-Corpus Scenarios with Semantic Feature Reconstruction and Dual Strategy Scoring

Jiangjian Xie, Yingqi Wang, Xinyuan Qian, Junguo Zhang, Bjorn W. Schuller

Research output: Contribution to journalArticlepeer-review

Abstract

Automated recognition of bird vocalizations (BVs) is essential for biodiversity monitoring through passive acoustic monitoring (PAM), yet deep learning (DL) models encounter substantial challenges in open environments. These include difficulties in detecting unknown classes, extracting species-specific features, and achieving robust cross-corpus recognition. To address these challenges, this letter presents a DL-based open-set cross-corpus recognition method for BVs that combines feature construction with open-set recognition (OSR) techniques. We introduce a three-channel spectrogram that integrates both amplitude and phase information to enhance feature representation. To improve the recognition accuracy of known classes across corpora, we employ a class-specific semantic reconstruction model to extract deep features. For unknown class discrimination, we propose a Dual Strategy Coupling Scoring (DSCS) mechanism, which synthesizes the log-likelihood ratio score (LLRS) and reconstruction error score (RES). Our method achieves the highest weighted accuracy among existing approaches on a public dataset, demonstrating its effectiveness for open-set cross-corpus bird vocalization recognition.

Original languageEnglish
JournalIEEE Signal Processing Letters
DOIs
StateAccepted/In press - 2025

Keywords

  • auto encoder
  • bird vocalizations recognition
  • cross-corpus
  • open-set
  • phase characteristics

Fingerprint

Dive into the research topics of 'Improving Bird Vocalization Recognition in Open-Set Cross-Corpus Scenarios with Semantic Feature Reconstruction and Dual Strategy Scoring'. Together they form a unique fingerprint.

Cite this