TY - JOUR
T1 - E-ODN
T2 - 25th Interspeech Conferece 2024
AU - Ma, Liuxian
AU - Shen, Lin
AU - Li, Ruobing
AU - Zhang, Haojie
AU - Qian, Kun
AU - Hu, Bin
AU - Schuller, Björn W.
AU - Yamamoto, Yoshiharu
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Recognising the widest range of emotions possible is a major challenge in the task of Speech Emotion Recognition (SER), especially for complex and mixed emotions. However, due to the limited number of emotional types and uneven distribution of data within existing datasets, current SER models are typically trained and used in a narrow range of emotional types. In this paper, we propose the Emotion Open Deep Network (E-ODN) model to address this issue. Besides, we introduce a novel Open-Set Recognition method that maps sample emotional features into a three-dimensional emotional space. The method can infer unknown emotions and initialise new type weights, enabling the model to dynamically learn and infer emerging emotional types. The empirical results show that our recognition model outperforms the state-of-the-art (SOTA) models in dealing with multi-type unbalanced data, and it can also perform finer-grained emotion recognition.
AB - Recognising the widest range of emotions possible is a major challenge in the task of Speech Emotion Recognition (SER), especially for complex and mixed emotions. However, due to the limited number of emotional types and uneven distribution of data within existing datasets, current SER models are typically trained and used in a narrow range of emotional types. In this paper, we propose the Emotion Open Deep Network (E-ODN) model to address this issue. Besides, we introduce a novel Open-Set Recognition method that maps sample emotional features into a three-dimensional emotional space. The method can infer unknown emotions and initialise new type weights, enabling the model to dynamically learn and infer emerging emotional types. The empirical results show that our recognition model outperforms the state-of-the-art (SOTA) models in dealing with multi-type unbalanced data, and it can also perform finer-grained emotion recognition.
KW - Dynamic Learning
KW - Open-set Recognition
KW - Speech Emotion Recognition
KW - Three-dimensional Emotional Space
UR - http://www.scopus.com/inward/record.url?scp=85214822461&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-685
DO - 10.21437/Interspeech.2024-685
M3 - Conference article
AN - SCOPUS:85214822461
SN - 2308-457X
SP - 4293
EP - 4297
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 1 September 2024 through 5 September 2024
ER -