Emotion and theme recognition in music using attention-based methods

Srividya Tirunellai Rajamani, Kumar Rajamani, Björn Schuller

Research output: Contribution to journalConference articlepeer-review


Emotion and theme recognition in music plays a vital role in music information retrieval and recommendation systems. Deep learning based techniques have shown great promise in this regard. Realising optimal network configurations with least number of FLOPS and model parameters is of paramount importance to obtain efficient deployable models, especially for resource constrained hardware. Yet, not much research has happened in this direction especially in the context of music emotion recognition. As part of the MediaEval 2020: Emotions and Themes in Music challenge, we (team name: AUGment), propose novel integration of attention based techniques for the task of emotion/mood recognition in music. We demonstrate that using stand-alone self-attention in the later layers of a VGG-ish network, matches the baseline PR-AUC with 11 % fewer FLOPS and 22 % fewer parameters. Further, utilising the learnable Attentionbased Rectified Linear Unit (AReLU) activation helps to achieve better performance than the baseline. As an additional gain, a late fusion of these two models with the baseline also improved the PR-AUC and ROC-AUC by 1 %.

Original languageEnglish
JournalCEUR Workshop Proceedings
StatePublished - 2020
Externally publishedYes
EventMultimedia Evaluation Benchmark Workshop 2020, MediaEval 2020 - Virtual, Online
Duration: 14 Dec 202015 Dec 2020


Dive into the research topics of 'Emotion and theme recognition in music using attention-based methods'. Together they form a unique fingerprint.

Cite this