E-ODN: An Emotion Open Deep Network for Generalised and Adaptive Speech Emotion Recognition

Liuxian Ma, Lin Shen, Ruobing Li, Haojie Zhang, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

Research output: Contribution to journalConference articlepeer-review

Abstract

Recognising the widest range of emotions possible is a major challenge in the task of Speech Emotion Recognition (SER), especially for complex and mixed emotions. However, due to the limited number of emotional types and uneven distribution of data within existing datasets, current SER models are typically trained and used in a narrow range of emotional types. In this paper, we propose the Emotion Open Deep Network (E-ODN) model to address this issue. Besides, we introduce a novel Open-Set Recognition method that maps sample emotional features into a three-dimensional emotional space. The method can infer unknown emotions and initialise new type weights, enabling the model to dynamically learn and infer emerging emotional types. The empirical results show that our recognition model outperforms the state-of-the-art (SOTA) models in dealing with multi-type unbalanced data, and it can also perform finer-grained emotion recognition.

Original languageEnglish
Pages (from-to)4293-4297
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Keywords

  • Dynamic Learning
  • Open-set Recognition
  • Speech Emotion Recognition
  • Three-dimensional Emotional Space

Fingerprint

Dive into the research topics of 'E-ODN: An Emotion Open Deep Network for Generalised and Adaptive Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this