Robust speech emotion recognition under different encoding conditions

Christopher Oates, Andreas Triantafyllopoulos, Ingmar Steiner, Björn Schuller

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

In an era where large speech corpora annotated for emotion are hard to come by, and especially ones where emotion is expressed freely instead of being acted, the importance of using free online sources for collecting such data cannot be overstated. Most of those sources, however, contain encoded audio due to storage and bandwidth constraints, often in very low bitrates. In addition, with the increased industry interest on voice-based applications, it is inevitable that speech emotion recognition (SER) algorithms will soon find their way into production environments, where the audio might be encoded in a different bitrate than the one available during training. Our contribution is threefold. First, we show that encoded audio still contains enough relevant information for robust SER. Next, we investigate the effects of mismatched encoding conditions in the training and test set both for traditional machine learning algorithms built on hand-crafted features and modern end-to-end methods. Finally, we investigate the robustness of those algorithms in the multi-condition scenario, where the training set is augmented with encoded audio, but still differs from the training set. Our results indicate that end-to-end methods are more robust even in the more challenging scenario of mismatched conditions.

Original languageEnglish
Pages (from-to)3935-3939
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
StatePublished - 2019
Externally publishedYes
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Keywords

  • Audio compression acronym
  • Speech
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'Robust speech emotion recognition under different encoding conditions'. Together they form a unique fingerprint.

Cite this