TY - GEN
T1 - Cross-language acoustic emotion recognition
T2 - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
AU - Feraru, Silvia Monica
AU - Schuller, Dagmar
AU - Schuller, Bjdrn
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/2
Y1 - 2015/12/2
N2 - Automatic emotion recognition from speech has matured close to the point where it reaches broader commercial interest. One of the last major limiting factors is the ability to deal with multilingual inputs as will be given in a real-life operating system in many if not most cases. As in real-life scenarios speech is often used mixed across languages more experience will be needed in performance effects of cross-language recognition. In this contribution we first provide an overview on languages covered in the research on emotion and speech finding that only roughly two thirds of native speakers' languages are so far touched upon. We thus next shed light on mis-matched vs matched condition emotion recognition across a variety of languages. By intention, we include less researched languages of more distant language families such as Burmese, Romanian or Turkish. Binary arousal and valence mapping is employed in order to be able to train and test across databases that have originally been labelled in diverse categories. In the result - as one may expect - arousal recognition works considerably better across languages than valence, and cross-language recognition falls considerably behind within-language recognition. However, within-language family recognition seems to provide an 'emergency-solution' in case of missing language resources, and the observed notable differences depending on the combination of languages show a number of interesting effects.
AB - Automatic emotion recognition from speech has matured close to the point where it reaches broader commercial interest. One of the last major limiting factors is the ability to deal with multilingual inputs as will be given in a real-life operating system in many if not most cases. As in real-life scenarios speech is often used mixed across languages more experience will be needed in performance effects of cross-language recognition. In this contribution we first provide an overview on languages covered in the research on emotion and speech finding that only roughly two thirds of native speakers' languages are so far touched upon. We thus next shed light on mis-matched vs matched condition emotion recognition across a variety of languages. By intention, we include less researched languages of more distant language families such as Burmese, Romanian or Turkish. Binary arousal and valence mapping is employed in order to be able to train and test across databases that have originally been labelled in diverse categories. In the result - as one may expect - arousal recognition works considerably better across languages than valence, and cross-language recognition falls considerably behind within-language recognition. However, within-language family recognition seems to provide an 'emergency-solution' in case of missing language resources, and the observed notable differences depending on the combination of languages show a number of interesting effects.
KW - Cross-Corpus
KW - Multilinguality
KW - Speech Emotion Recognition
UR - http://www.scopus.com/inward/record.url?scp=84964049090&partnerID=8YFLogxK
U2 - 10.1109/ACII.2015.7344561
DO - 10.1109/ACII.2015.7344561
M3 - Conference contribution
AN - SCOPUS:84964049090
T3 - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
SP - 125
EP - 131
BT - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 September 2015 through 24 September 2015
ER -