TY - GEN
T1 - Multimodal emotion recognition in audiovisual communication
AU - Schuller, Björn
AU - Lang, Manfred
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
© 2002 IEEE.
PY - 2002
Y1 - 2002
N2 - This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding the desired intention. The introduced methods are based on the common prosodic speech features pitch and energy, but rely also on the semantic and intention based features wording, degree of verbosity, temporal intention and word rate, and finally the history of user utterances. As further modality even touch-screen or mouse interaction is analyzed. The estimates based on these features are integrated in a multimodal way. The introduced methods are based on results of user studies. A realization proved to be reliable compared with subjective probands' impressions.
AB - This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding the desired intention. The introduced methods are based on the common prosodic speech features pitch and energy, but rely also on the semantic and intention based features wording, degree of verbosity, temporal intention and word rate, and finally the history of user utterances. As further modality even touch-screen or mouse interaction is analyzed. The estimates based on these features are integrated in a multimodal way. The introduced methods are based on results of user studies. A realization proved to be reliable compared with subjective probands' impressions.
UR - http://www.scopus.com/inward/record.url?scp=84908319871&partnerID=8YFLogxK
U2 - 10.1109/ICME.2002.1035889
DO - 10.1109/ICME.2002.1035889
M3 - Conference contribution
AN - SCOPUS:84908319871
T3 - Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
SP - 745
EP - 748
BT - Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
Y2 - 26 August 2002 through 29 August 2002
ER -