E-ODN: An Emotion Open Deep Network for Generalised and Adaptive Speech Emotion Recognition

Liuxian Ma, Lin Shen, Ruobing Li, Haojie Zhang, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

Abstract

Recognising the widest range of emotions possible is a major challenge in the task of Speech Emotion Recognition (SER), especially for complex and mixed emotions. However, due to the limited number of emotional types and uneven distribution of data within existing datasets, current SER models are typically trained and used in a narrow range of emotional types. In this paper, we propose the Emotion Open Deep Network (E-ODN) model to address this issue. Besides, we introduce a novel Open-Set Recognition method that maps sample emotional features into a three-dimensional emotional space. The method can infer unknown emotions and initialise new type weights, enabling the model to dynamically learn and infer emerging emotional types. The empirical results show that our recognition model outperforms the state-of-the-art (SOTA) models in dealing with multi-type unbalanced data, and it can also perform finer-grained emotion recognition.

OriginalspracheEnglisch
Seiten (von - bis)4293-4297
Seitenumfang5
FachzeitschriftProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
PublikationsstatusVeröffentlicht - 2024
Veranstaltung25th Interspeech Conferece 2024 - Kos Island, Griechenland
Dauer: 1 Sept. 20245 Sept. 2024

Fingerprint

Untersuchen Sie die Forschungsthemen von „E-ODN: An Emotion Open Deep Network for Generalised and Adaptive Speech Emotion Recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren