TY - JOUR
T1 - Being bored? Recognising natural interest by extensive audiovisual integration for real-life application
AU - Schuller, Björn
AU - Müller, Ronald
AU - Eyben, Florian
AU - Gast, Jürgen
AU - Hörnler, Benedikt
AU - Wöllmer, Martin
AU - Rigoll, Gerhard
AU - Höthker, Anja
AU - Konosu, Hitoshi
PY - 2009/11
Y1 - 2009/11
N2 - Automatic detection of the level of human interest is of high relevance for many technical applications, such as automatic customer care or tutoring systems. However, the recognition of spontaneous interest in natural conversations independently of the subject remains a challenge. Identification of human affective states relying on single modalities only is often impossible, even for humans, since different modalities contain partially disjunctive cues. Multimodal approaches to human affect recognition generally are shown to boost recognition performance, yet are evaluated in restrictive laboratory settings only. Herein we introduce a fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eye-activity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process. We provide detailed subject-independent results for classification and regression of the Level of Interest using Support-Vector Machines on an audiovisual interest corpus (AVIC) consisting of spontaneous, conversational speech demonstrating "theoretical" effectiveness of the approach. Further, to evaluate the approach with regards to real-life usability a user-study is conducted for proof of "practical" effectiveness.
AB - Automatic detection of the level of human interest is of high relevance for many technical applications, such as automatic customer care or tutoring systems. However, the recognition of spontaneous interest in natural conversations independently of the subject remains a challenge. Identification of human affective states relying on single modalities only is often impossible, even for humans, since different modalities contain partially disjunctive cues. Multimodal approaches to human affect recognition generally are shown to boost recognition performance, yet are evaluated in restrictive laboratory settings only. Herein we introduce a fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eye-activity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process. We provide detailed subject-independent results for classification and regression of the Level of Interest using Support-Vector Machines on an audiovisual interest corpus (AVIC) consisting of spontaneous, conversational speech demonstrating "theoretical" effectiveness of the approach. Further, to evaluate the approach with regards to real-life usability a user-study is conducted for proof of "practical" effectiveness.
KW - Affective computing
KW - Audiovisual processing
KW - Interest recognition
UR - http://www.scopus.com/inward/record.url?scp=70349292240&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2009.02.013
DO - 10.1016/j.imavis.2009.02.013
M3 - Article
AN - SCOPUS:70349292240
SN - 0262-8856
VL - 27
SP - 1760
EP - 1774
JO - Image and Vision Computing
JF - Image and Vision Computing
IS - 12
ER -