How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers

Vedhas Pandit, Maximilian Schmitt, Nicholas Cummins, Franz Graf, Lucas Paletta, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

We evaluate, for the first time, the generalisability of in-the-wild speech-based affect tracking models using the database used in the ‘Affect Recognition’ sub-challenge of the Audio/Visual Emotion Challenge and Workshop (AVEC 2017) – namely the ‘Automatic Sentiment Analysis in the Wild (SEWA)’ and the ‘Graz Real-life Affect in the Street and Supermarket (GRAS 2 )’ corpus. The GRAS2 corpus is the only corpus to date featuring audiovisual recordings and time-continuous affect labels of the random participants recorded surreptitiously in a public place. The SEWA database was also collected in an in-the-wild paradigm in that it also features spontaneous affect behaviours, and real-life acoustic disruptions due to connectivity and hardware problems. The SEWA participants, however, were well aware of being recorded throughout, and thus the data potentially suffers from the ‘observer’s paradox’. In this paper, we evaluate how a model trained on a typical data suffering from the observer’s paradox (SEWA) fairs on a real-life data that is relatively free from such psychological effect (GRAS 2 ), and vice versa. Because of the drastically different recording conditions and the recording equipments, the feature spaces for the two databases differ extremely. The in-the-wild nature of the real-life databases, and the extreme disparity between the feature spaces are the key challenges tackled in this paper, a problem of a high practical relevance. We extract bag of audio words features using, for the very first time, a randomised database-independent codebook. True to our hypothesis, the Support Vector Regression model trained on GRAS 2 had better generalisability, as this model could reasonably predict the SEWA arousal labels.

Original languageEnglish
Title of host publicationSpeech and Computer - 20th International Conference, SPECOM 2018, Proceedings
EditorsRodmonga Potapova, Oliver Jokisch, Alexey Karpov
PublisherSpringer Verlag
Pages490-500
Number of pages11
ISBN (Print)9783319995786
DOIs
StatePublished - 2018
Externally publishedYes
Event20th International Conference on Speech and Computer, SPECOM 2018 - Leipzig, Germany
Duration: 18 Sep 201822 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11096 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Speech and Computer, SPECOM 2018
Country/TerritoryGermany
CityLeipzig
Period18/09/1822/09/18

Keywords

  • Affective speech analysis
  • Authentic emotions
  • In-the-wild
  • Observer’s paradox
  • One-way mirror dilemma
  • Transfer learning

Fingerprint

Dive into the research topics of 'How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers'. Together they form a unique fingerprint.

Cite this