Hierarchical Distribution Adaptation for Unsupervised Cross-corpus Speech Emotion Recognition

Cheng Lu, Yuan Zong, Yan Zhao, Hailun Lian, Tianhua Qi, Björn Schuller, Wenming Zheng

Research output: Contribution to journalConference articlepeer-review

Abstract

The primary issue of unsupervised cross-corpus speech emotion recognition (SER) is that domain shift between the training and testing data undermines the SER model's ability to generalize on unknown testing datasets. In this paper, we propose a straightforward and effective strategy, called Hierarchical Distribution Adaptation (HDA), to address the domain bias issue. HDA leverages a hierarchical emotion representation module based on nested Transformers to extract speech emotion features at different levels (e. g., frame/segment/utterance-level), for capturing multiple-scale emotion correlations in speech. Furthermore, a hierarchical distribution adaptation module, including frame-level distribution adaptation (FDA), segment-level distribution adaptation (SDA), and utterance-level distribution adaptation (UDA), is developed to align the hierarchical-level emotion representations of the training and testing speech samples to effectively eliminate domain discrepancy. Extensive experimental results demonstrate the superiority of our proposed HDA over other state-of-the art (SOTA) methods.

Original languageEnglish
Pages (from-to)3739-3743
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Externally publishedYes
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Keywords

  • Cross-corpus
  • hierarchical domain adaptation
  • speech emotion recognition
  • speech representation

Fingerprint

Dive into the research topics of 'Hierarchical Distribution Adaptation for Unsupervised Cross-corpus Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this