TY - GEN
T1 - Tracking Authentic and In-the-wild Emotions Using Speech
AU - Pandit, Vedhas
AU - Cummins, Nicholas
AU - Schmitt, Maximilian
AU - Hantke, Simone
AU - Graf, Franz
AU - Paletta, Lucas
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/21
Y1 - 2018/9/21
N2 - This first-of-its-kind study aims to track authentic affect representations in-the-wild. We use the 'Graz Real-life Affect in the Street and Supermarket (GRAS2)' corpus featuring audiovisual recordings of random participants in non-laboratory conditions. The participants were initially unaware of being recorded. This paradigm enabled us to use a collection of a wide range of authentic, spontaneous and natural affective behaviours. Six raters annotated twenty-eight conversations averaging 2.5 minutes in duration, tracking the arousal and valence levels of the participants. We generate the gold standards through a novel robust Evaluator Weighted Estimator (EWE) formulation. We train Support Vector Regressors (SVR) and Recurrent Neural Networks (RNN) with the low-level-descriptors (LLDs) of the ComParE feature-set in different derived representations including bag-of-audio-words. Despite the challenging nature of this database, a fusion system achieved a highly promising concordance correlation coefficient (CCC) of.372 for arousal dimension, while RNNs achieved a top CCC of.223 in predicting valence, using a bag-of-features representation.
AB - This first-of-its-kind study aims to track authentic affect representations in-the-wild. We use the 'Graz Real-life Affect in the Street and Supermarket (GRAS2)' corpus featuring audiovisual recordings of random participants in non-laboratory conditions. The participants were initially unaware of being recorded. This paradigm enabled us to use a collection of a wide range of authentic, spontaneous and natural affective behaviours. Six raters annotated twenty-eight conversations averaging 2.5 minutes in duration, tracking the arousal and valence levels of the participants. We generate the gold standards through a novel robust Evaluator Weighted Estimator (EWE) formulation. We train Support Vector Regressors (SVR) and Recurrent Neural Networks (RNN) with the low-level-descriptors (LLDs) of the ComParE feature-set in different derived representations including bag-of-audio-words. Despite the challenging nature of this database, a fusion system achieved a highly promising concordance correlation coefficient (CCC) of.372 for arousal dimension, while RNNs achieved a top CCC of.223 in predicting valence, using a bag-of-features representation.
KW - Affective Computing
KW - Affective Speech Analysis
KW - Authentic Emotions
KW - Bag-of-Audio-Words
KW - Gated Recurrent Units
KW - In-the-Wild
UR - http://www.scopus.com/inward/record.url?scp=85053814366&partnerID=8YFLogxK
U2 - 10.1109/ACIIAsia.2018.8470340
DO - 10.1109/ACIIAsia.2018.8470340
M3 - Conference contribution
AN - SCOPUS:85053814366
T3 - 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018
BT - 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018
Y2 - 20 May 2018 through 22 May 2018
ER -