Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition

Cheng Lu, Yuan Zong, Wenming Zheng, Yang Li, Chuangao Tang, Bjorn W. Schuller

Research output: Contribution to journalArticlepeer-review

49 Scopus citations

Abstract

In this paper, we propose a novel domain invariant feature learning (DIFL) method to deal with speaker-independent speech emotion recognition (SER). The basic idea of DIFL is to learn the speaker-invariant emotion feature by eliminating domain shifts between the training and testing data caused by different speakers from the perspective of multi-source unsupervised domain adaptation (UDA). Specifically, we embed a hierarchical alignment layer with the strong-weak distribution alignment strategy into the feature extraction block to firstly reduce the discrepancy in feature distributions of speech samples across different speakers as much as possible. Furthermore, multiple discriminators in the discriminator block are utilized to confuse the speaker information of emotion features both inside the training data and between the training and testing data. Through them, a multi-domain invariant representation of emotional speech can be gradually and adaptively achieved by updating network parameters. We conduct extensive experiments on three public datasets, i. e., Emo-DB, eNTERFACE, and CASIA, to evaluate the SER performance of the proposed method, respectively. The experimental results show that the proposed method is superior to the state-of-the-art methods.

Original languageEnglish
Pages (from-to)2217-2230
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume30
DOIs
StatePublished - 2022
Externally publishedYes

Keywords

  • Speech emotion recognition
  • adversarial learning
  • multi-source domain adaptation
  • speaker independent
  • unsupervised domain adaptation

Fingerprint

Dive into the research topics of 'Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this