Abstract
The primary issue of unsupervised cross-corpus speech emotion recognition (SER) is that domain shift between the training and testing data undermines the SER model's ability to generalize on unknown testing datasets. In this paper, we propose a straightforward and effective strategy, called Hierarchical Distribution Adaptation (HDA), to address the domain bias issue. HDA leverages a hierarchical emotion representation module based on nested Transformers to extract speech emotion features at different levels (e. g., frame/segment/utterance-level), for capturing multiple-scale emotion correlations in speech. Furthermore, a hierarchical distribution adaptation module, including frame-level distribution adaptation (FDA), segment-level distribution adaptation (SDA), and utterance-level distribution adaptation (UDA), is developed to align the hierarchical-level emotion representations of the training and testing speech samples to effectively eliminate domain discrepancy. Extensive experimental results demonstrate the superiority of our proposed HDA over other state-of-the art (SOTA) methods.
| Original language | English |
|---|---|
| Pages (from-to) | 3739-3743 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| DOIs | |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 25th Interspeech Conferece 2024 - Kos Island, Greece Duration: 1 Sep 2024 → 5 Sep 2024 |
Keywords
- Cross-corpus
- hierarchical domain adaptation
- speech emotion recognition
- speech representation
Fingerprint
Dive into the research topics of 'Hierarchical Distribution Adaptation for Unsupervised Cross-corpus Speech Emotion Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver