TY - GEN
T1 - Emotion recognition in the wild
T2 - 16th ACM International Conference on Multimodal Interaction, ICMI 2014
AU - Ringeval, Fabien
AU - Amiriparian, Shahin
AU - Eyben, Florian
AU - Scherer, Klaus
AU - Schuller, Björn
N1 - Publisher Copyright:
Copyright 2014 ACM.
PY - 2014/11/12
Y1 - 2014/11/12
N2 - In this paper, we investigate the relevance of using voice and lip activity to improve performance of audiovisual emotion recognition in unconstrained settings, as part of the 2014 Emotion Recognition in the Wild Challenge (EmotiW14). Indeed, the dataset provided by the organisers contains movie excerpts with highly challenging variability in terms of audiovisual content; e. g., speech and/or face of the subject expressing the emotion can be absent in the data. We therefore propose to tackle this issue by incorporating both voice and lip activity as additional features in a decisionlevel fusion. Results obtained on the blind test set show that the decision-level fusion can improve the best monomodal approach, and that the addition of both voice and lip activity in the feature set leads to the best performance (UAR = 35:27%), with an absolute improvement of 5.36% over the baseline.
AB - In this paper, we investigate the relevance of using voice and lip activity to improve performance of audiovisual emotion recognition in unconstrained settings, as part of the 2014 Emotion Recognition in the Wild Challenge (EmotiW14). Indeed, the dataset provided by the organisers contains movie excerpts with highly challenging variability in terms of audiovisual content; e. g., speech and/or face of the subject expressing the emotion can be absent in the data. We therefore propose to tackle this issue by incorporating both voice and lip activity as additional features in a decisionlevel fusion. Results obtained on the blind test set show that the decision-level fusion can improve the best monomodal approach, and that the addition of both voice and lip activity in the feature set leads to the best performance (UAR = 35:27%), with an absolute improvement of 5.36% over the baseline.
KW - Decision-level fusion
KW - Emotion recognition
KW - Lip activity detection
KW - Multimedia
KW - Voice activity detection
UR - https://www.scopus.com/pages/publications/84947261353
U2 - 10.1145/2663204.2666271
DO - 10.1145/2663204.2666271
M3 - Conference contribution
AN - SCOPUS:84947261353
T3 - ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction
SP - 473
EP - 480
BT - ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction
PB - Association for Computing Machinery, Inc
Y2 - 12 November 2014 through 16 November 2014
ER -