Automatically estimating emotion in music with deep long-short term memory recurrent neural networks

Eduardo Coutinho, George Trigeorgis, Stefanos Zafeiriou, Björn Schuller

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations

Abstract

In this paper we describe our approach for the MediaEval's "Emotion in Music" task. Our method consists of deep Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression, using acoustic and psychoacoustic features extracted from the songs that have been previously proven as effective for emotion prediction in music. Results on the challenge test demonstrate an excellent performance for Arousal estimation (r = 0.613 ± 0.278), but not for Valence (r = 0.026 ± 0.500). Issues regarding the quality of the test set annotations' reliability and distributions are indicated as plausible justifications for these results. By using a subset of the development set that was left out for performance estimation, we could determine that the performance of our approach may be underestimated for Valence (Arousal: r = 0.596 ± 0.386; Valence: r = 0.458 ± 0.551).

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1436
StatePublished - 2015
Externally publishedYes
EventMultimedia Benchmark Workshop, MediaEval 2015 - Wurzen, Germany
Duration: 14 Sep 201515 Sep 2015

Fingerprint

Dive into the research topics of 'Automatically estimating emotion in music with deep long-short term memory recurrent neural networks'. Together they form a unique fingerprint.

Cite this