Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio

Felix Weninger, Fabien Ringeval, Erik Marchi, Bjorn Schuller

Research output: Contribution to journalConference articlepeer-review

59 Scopus citations

Abstract

Continuous dimensional emotion recognition from audio is a sequential regression problem, where the goal is to maximize correlation between sequences of regression outputs and continuous-valued emotion contours, while minimizing the average deviation. As in other domains, deep neural networks trained on simple acoustic features achieve good performance on this task. Yet, the usual squared error objective functions for neural network training do not fully take into account the above-named goal. Hence, in this paper we introduce a technique for the discriminative training of deep neural networks using the concordance correlation coefficient as cost function, which unites both correlation and mean squared error in a single differentiable function. Results on the MediaEval 2013 and AV+EC 2015 Challenge data sets show that the proposed method can significantly improve the evaluation criteria compared to standard mean squared error training, both in the music and speech domains.

Original languageEnglish
Pages (from-to)2196-2202
Number of pages7
JournalIJCAI International Joint Conference on Artificial Intelligence
Volume2016-January
StatePublished - 2016
Externally publishedYes
Event25th International Joint Conference on Artificial Intelligence, IJCAI 2016 - New York, United States
Duration: 9 Jul 201615 Jul 2016

Fingerprint

Dive into the research topics of 'Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio'. Together they form a unique fingerprint.

Cite this