Multimodal emotion recognition in audiovisual communication

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

38 Scopus citations

Abstract

This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding the desired intention. The introduced methods are based on the common prosodic speech features pitch and energy, but rely also on the semantic and intention based features wording, degree of verbosity, temporal intention and word rate, and finally the history of user utterances. As further modality even touch-screen or mouse interaction is analyzed. The estimates based on these features are integrated in a multimodal way. The introduced methods are based on results of user studies. A realization proved to be reliable compared with subjective probands' impressions.

Original languageEnglish
Title of host publicationProceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages745-748
Number of pages4
ISBN (Electronic)0780373049
DOIs
StatePublished - 2002
Event2002 IEEE International Conference on Multimedia and Expo, ICME 2002 - Lausanne, Switzerland
Duration: 26 Aug 200229 Aug 2002

Publication series

NameProceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
Volume1

Conference

Conference2002 IEEE International Conference on Multimedia and Expo, ICME 2002
Country/TerritorySwitzerland
CityLausanne
Period26/08/0229/08/02

Fingerprint

Dive into the research topics of 'Multimodal emotion recognition in audiovisual communication'. Together they form a unique fingerprint.

Cite this