TY - GEN
T1 - Bimodal fusion of emotional data in an automotive environment
AU - Hoch, S.
AU - Althoff, F.
AU - McGlaun, G.
AU - Rigoll, G.
PY - 2005
Y1 - 2005
N2 - In this work, we present a flexible bimodal approach to person dependent emotion recognition in an automotive environment by adapting an acoustic and a visual monomodal recognizer and combining the individual results on an abstract decision level. The reference database consists of 840 acted audiovisual examples of seven different speakers, expressing the three emotions positive (joy), negative (anger, irritation) and neutral. Concerning the acoustic modul, we calculate the statistics of commonly known low-level features. Facial expressions are evaluated by a SVM classification of gabor-filtered face regions. At the subsequent integration stage, both monomodal decisions are fused by a weighted linear combination. An evaluation of the recorded examples yields an average recognition rate of 90, 7% for the fusion approach. This adds up to a performance gain of nearly 4% compared to the best monomodal recognizer. The system is currently used to improve the usability for automotive infotainment interfaces.
AB - In this work, we present a flexible bimodal approach to person dependent emotion recognition in an automotive environment by adapting an acoustic and a visual monomodal recognizer and combining the individual results on an abstract decision level. The reference database consists of 840 acted audiovisual examples of seven different speakers, expressing the three emotions positive (joy), negative (anger, irritation) and neutral. Concerning the acoustic modul, we calculate the statistics of commonly known low-level features. Facial expressions are evaluated by a SVM classification of gabor-filtered face regions. At the subsequent integration stage, both monomodal decisions are fused by a weighted linear combination. An evaluation of the recorded examples yields an average recognition rate of 90, 7% for the fusion approach. This adds up to a performance gain of nearly 4% compared to the best monomodal recognizer. The system is currently used to improve the usability for automotive infotainment interfaces.
UR - http://www.scopus.com/inward/record.url?scp=33646820042&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415597
DO - 10.1109/ICASSP.2005.1415597
M3 - Conference contribution
AN - SCOPUS:33646820042
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1085
EP - 1088
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -