TY - GEN
T1 - From hard to soft
T2 - 25th ACM International Conference on Multimedia, MM 2017
AU - Han, Jing
AU - Zhang, Zixing
AU - Schmitt, Maximilian
AU - Pantic, Maja
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/10/23
Y1 - 2017/10/23
N2 - Over the last decade, automatic emotion recognition has become well established. The gold standard target is thereby usually calculated based on multiple annotations from different raters. All related efforts assume that the emotional state of a human subject can be identified by a 'hard' category or a unique value. This assumption tries to ease the human observer's subjectivity when observing patterns such as the emotional state of others. However, as the number of annotators cannot be infinite, uncertainty remains in the emotion target even if calculated from several, yet few human annotators. The common procedure to use this same emotion target in the learning process thus inevitably introduces noise in terms of an uncertain learning target. In this light, we propose a 'soft' prediction framework to provide a more human-like and comprehensive prediction of emotion. In our novel framework, we provide an additional target to indicate the uncertainty of human perception based on the inter-rater disagreement level, in contrast to the traditional framework which is merely producing one single prediction (category or value). To exploit the dependency between the emotional state and the newly introduced perception uncertainty, we implement a multi-task learning strategy. To evaluate the feasibility and effectiveness of the proposed soft prediction framework, we perform extensive experiments on a time- and value-continuous spontaneous audiovisual emotion database including late fusion results. We show that the soft prediction framework with multitask learning of the emotional state and its perception uncertainty significantly outperforms the individual tasks in both the arousal and valence dimensions.
AB - Over the last decade, automatic emotion recognition has become well established. The gold standard target is thereby usually calculated based on multiple annotations from different raters. All related efforts assume that the emotional state of a human subject can be identified by a 'hard' category or a unique value. This assumption tries to ease the human observer's subjectivity when observing patterns such as the emotional state of others. However, as the number of annotators cannot be infinite, uncertainty remains in the emotion target even if calculated from several, yet few human annotators. The common procedure to use this same emotion target in the learning process thus inevitably introduces noise in terms of an uncertain learning target. In this light, we propose a 'soft' prediction framework to provide a more human-like and comprehensive prediction of emotion. In our novel framework, we provide an additional target to indicate the uncertainty of human perception based on the inter-rater disagreement level, in contrast to the traditional framework which is merely producing one single prediction (category or value). To exploit the dependency between the emotional state and the newly introduced perception uncertainty, we implement a multi-task learning strategy. To evaluate the feasibility and effectiveness of the proposed soft prediction framework, we perform extensive experiments on a time- and value-continuous spontaneous audiovisual emotion database including late fusion results. We show that the soft prediction framework with multitask learning of the emotional state and its perception uncertainty significantly outperforms the individual tasks in both the arousal and valence dimensions.
KW - Emotion recognition
KW - Long short-term memory
KW - Multi-task learning
KW - Perception uncertainty modelling
UR - http://www.scopus.com/inward/record.url?scp=85035203028&partnerID=8YFLogxK
U2 - 10.1145/3123266.3123383
DO - 10.1145/3123266.3123383
M3 - Conference contribution
AN - SCOPUS:85035203028
T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
SP - 890
EP - 897
BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
Y2 - 23 October 2017 through 27 October 2017
ER -