TY - GEN
T1 - String-based audiovisual fusion of behavioural events for the assessment of dimensional affect
AU - Eyben, Florian
AU - Wöllmer, Martin
AU - Valstar, Michel F.
AU - Gunes, Hatice
AU - Schuller, Björn
AU - Pantic, Maja
PY - 2011
Y1 - 2011
N2 - The automatic assessment of affect is mostly based on feature-level approaches, such as distances between facial points or prosodic and spectral information when it comes to audiovisual analysis. However, it is known and intuitive that behavioural events such as smiles, head shakes or laughter and sighs also bear highly relevant information regarding a subject's affective display. Accordingly, we propose a novel string-based prediction approach to fuse such events and to predict human affect in a continuous dimensional space. Extensive analysis and evaluation has been conducted using the newly released SEMAINE database of human-to-agent communication. For a thorough understanding of the obtained results, we provide additional benchmarks by more conventional feature-level modelling, and compare these and the string-based approach to fusion of signal-based features and string-based events. Our experimental results show that the proposed string-based approach is the best performing approach for automatic prediction of Valence and Expectation dimensions, and improves prediction performance for the other dimensions when combined with at least acoustic signal-based features.
AB - The automatic assessment of affect is mostly based on feature-level approaches, such as distances between facial points or prosodic and spectral information when it comes to audiovisual analysis. However, it is known and intuitive that behavioural events such as smiles, head shakes or laughter and sighs also bear highly relevant information regarding a subject's affective display. Accordingly, we propose a novel string-based prediction approach to fuse such events and to predict human affect in a continuous dimensional space. Extensive analysis and evaluation has been conducted using the newly released SEMAINE database of human-to-agent communication. For a thorough understanding of the obtained results, we provide additional benchmarks by more conventional feature-level modelling, and compare these and the string-based approach to fusion of signal-based features and string-based events. Our experimental results show that the proposed string-based approach is the best performing approach for automatic prediction of Valence and Expectation dimensions, and improves prediction performance for the other dimensions when combined with at least acoustic signal-based features.
UR - http://www.scopus.com/inward/record.url?scp=79958694881&partnerID=8YFLogxK
U2 - 10.1109/FG.2011.5771417
DO - 10.1109/FG.2011.5771417
M3 - Conference contribution
AN - SCOPUS:79958694881
SN - 9781424491407
T3 - 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011
SP - 322
EP - 329
BT - 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011
T2 - 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011
Y2 - 21 March 2011 through 25 March 2011
ER -