TY - GEN
T1 - Tendencies regarding the effect of emotional intensity in inter corpus phoneme-level speech emotion modelling
AU - Vlasenko, Bogdan
AU - Schuller, Bjorn
AU - Wendemuth, Andreas
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/8
Y1 - 2016/11/8
N2 - As emotion recognition from speech has matured to a degree where it becomes suitable for real-life applications, it is time for developing techniques for matching different types of emotional data with multi-dimensional and categories-based annotations. The categorical approach is usually applied for acted 'full blown' emotions and multi-dimensional annotation is often preferred for spontaneous real life emotions. A particularly realistic task we consider in this contribution is cross-corpus emotion recognition and its evaluation. General and phoneme-level emotional models on acted and spontaneous emotions ('very intense' and 'intense') are used in our experimental study. The emotional models were trained on spontaneous emotions from the complete VAM dataset and subsets with variable emotional intensities and evaluated on acted emotions from the Berlin EMO-DB dataset. We observe a significant classification performance gap for general models trained on very intense spontaneous emotions. As a consequence, we address the importance of collecting large corpora with very intense emotional content for training more reliable phoneme-level emotional models.
AB - As emotion recognition from speech has matured to a degree where it becomes suitable for real-life applications, it is time for developing techniques for matching different types of emotional data with multi-dimensional and categories-based annotations. The categorical approach is usually applied for acted 'full blown' emotions and multi-dimensional annotation is often preferred for spontaneous real life emotions. A particularly realistic task we consider in this contribution is cross-corpus emotion recognition and its evaluation. General and phoneme-level emotional models on acted and spontaneous emotions ('very intense' and 'intense') are used in our experimental study. The emotional models were trained on spontaneous emotions from the complete VAM dataset and subsets with variable emotional intensities and evaluated on acted emotions from the Berlin EMO-DB dataset. We observe a significant classification performance gap for general models trained on very intense spontaneous emotions. As a consequence, we address the importance of collecting large corpora with very intense emotional content for training more reliable phoneme-level emotional models.
KW - cross-corpus evaluation
KW - emotion recognition
KW - emotional intensity
KW - phoneme-level emotional models
KW - turn-level emotional models
UR - http://www.scopus.com/inward/record.url?scp=85002193745&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2016.7738859
DO - 10.1109/MLSP.2016.7738859
M3 - Conference contribution
AN - SCOPUS:85002193745
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 2016 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings
A2 - Diamantaras, Kostas
A2 - Uncini, Aurelio
A2 - Palmieri, Francesco A. N.
A2 - Larsen, Jan
PB - IEEE Computer Society
T2 - 26th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings
Y2 - 13 September 2016 through 16 September 2016
ER -