Emotion recognition in the wild: incorporating voice and lip activity in multimodal decision-level fusion

  • Fabien Ringeval
  • , Shahin Amiriparian
  • , Florian Eyben
  • , Klaus Scherer
  • , Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

In this paper, we investigate the relevance of using voice and lip activity to improve performance of audiovisual emotion recognition in unconstrained settings, as part of the 2014 Emotion Recognition in the Wild Challenge (EmotiW14). Indeed, the dataset provided by the organisers contains movie excerpts with highly challenging variability in terms of audiovisual content; e. g., speech and/or face of the subject expressing the emotion can be absent in the data. We therefore propose to tackle this issue by incorporating both voice and lip activity as additional features in a decisionlevel fusion. Results obtained on the blind test set show that the decision-level fusion can improve the best monomodal approach, and that the addition of both voice and lip activity in the feature set leads to the best performance (UAR = 35:27%), with an absolute improvement of 5.36% over the baseline.

Original languageEnglish
Title of host publicationICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery, Inc
Pages473-480
Number of pages8
ISBN (Electronic)9781450328852
DOIs
StatePublished - 12 Nov 2014
Externally publishedYes
Event16th ACM International Conference on Multimodal Interaction, ICMI 2014 - Istanbul, Turkey
Duration: 12 Nov 201416 Nov 2014

Publication series

NameICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction

Conference

Conference16th ACM International Conference on Multimodal Interaction, ICMI 2014
Country/TerritoryTurkey
CityIstanbul
Period12/11/1416/11/14

Keywords

  • Decision-level fusion
  • Emotion recognition
  • Lip activity detection
  • Multimedia
  • Voice activity detection

Fingerprint

Dive into the research topics of 'Emotion recognition in the wild: incorporating voice and lip activity in multimodal decision-level fusion'. Together they form a unique fingerprint.

Cite this