TY - GEN
T1 - How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers
AU - Pandit, Vedhas
AU - Schmitt, Maximilian
AU - Cummins, Nicholas
AU - Graf, Franz
AU - Paletta, Lucas
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - We evaluate, for the first time, the generalisability of in-the-wild speech-based affect tracking models using the database used in the ‘Affect Recognition’ sub-challenge of the Audio/Visual Emotion Challenge and Workshop (AVEC 2017) – namely the ‘Automatic Sentiment Analysis in the Wild (SEWA)’ and the ‘Graz Real-life Affect in the Street and Supermarket (GRAS 2 )’ corpus. The GRAS2 corpus is the only corpus to date featuring audiovisual recordings and time-continuous affect labels of the random participants recorded surreptitiously in a public place. The SEWA database was also collected in an in-the-wild paradigm in that it also features spontaneous affect behaviours, and real-life acoustic disruptions due to connectivity and hardware problems. The SEWA participants, however, were well aware of being recorded throughout, and thus the data potentially suffers from the ‘observer’s paradox’. In this paper, we evaluate how a model trained on a typical data suffering from the observer’s paradox (SEWA) fairs on a real-life data that is relatively free from such psychological effect (GRAS 2 ), and vice versa. Because of the drastically different recording conditions and the recording equipments, the feature spaces for the two databases differ extremely. The in-the-wild nature of the real-life databases, and the extreme disparity between the feature spaces are the key challenges tackled in this paper, a problem of a high practical relevance. We extract bag of audio words features using, for the very first time, a randomised database-independent codebook. True to our hypothesis, the Support Vector Regression model trained on GRAS 2 had better generalisability, as this model could reasonably predict the SEWA arousal labels.
AB - We evaluate, for the first time, the generalisability of in-the-wild speech-based affect tracking models using the database used in the ‘Affect Recognition’ sub-challenge of the Audio/Visual Emotion Challenge and Workshop (AVEC 2017) – namely the ‘Automatic Sentiment Analysis in the Wild (SEWA)’ and the ‘Graz Real-life Affect in the Street and Supermarket (GRAS 2 )’ corpus. The GRAS2 corpus is the only corpus to date featuring audiovisual recordings and time-continuous affect labels of the random participants recorded surreptitiously in a public place. The SEWA database was also collected in an in-the-wild paradigm in that it also features spontaneous affect behaviours, and real-life acoustic disruptions due to connectivity and hardware problems. The SEWA participants, however, were well aware of being recorded throughout, and thus the data potentially suffers from the ‘observer’s paradox’. In this paper, we evaluate how a model trained on a typical data suffering from the observer’s paradox (SEWA) fairs on a real-life data that is relatively free from such psychological effect (GRAS 2 ), and vice versa. Because of the drastically different recording conditions and the recording equipments, the feature spaces for the two databases differ extremely. The in-the-wild nature of the real-life databases, and the extreme disparity between the feature spaces are the key challenges tackled in this paper, a problem of a high practical relevance. We extract bag of audio words features using, for the very first time, a randomised database-independent codebook. True to our hypothesis, the Support Vector Regression model trained on GRAS 2 had better generalisability, as this model could reasonably predict the SEWA arousal labels.
KW - Affective speech analysis
KW - Authentic emotions
KW - In-the-wild
KW - Observer’s paradox
KW - One-way mirror dilemma
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85053759215&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-99579-3_51
DO - 10.1007/978-3-319-99579-3_51
M3 - Conference contribution
AN - SCOPUS:85053759215
SN - 9783319995786
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 490
EP - 500
BT - Speech and Computer - 20th International Conference, SPECOM 2018, Proceedings
A2 - Potapova, Rodmonga
A2 - Jokisch, Oliver
A2 - Karpov, Alexey
PB - Springer Verlag
T2 - 20th International Conference on Speech and Computer, SPECOM 2018
Y2 - 18 September 2018 through 22 September 2018
ER -