TY - JOUR
T1 - Emotion and theme recognition in music using attention-based methods
AU - Rajamani, Srividya Tirunellai
AU - Rajamani, Kumar
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2020 Copyright 2020 for this paper by its authors. All Rights Reserved.
PY - 2020
Y1 - 2020
N2 - Emotion and theme recognition in music plays a vital role in music information retrieval and recommendation systems. Deep learning based techniques have shown great promise in this regard. Realising optimal network configurations with least number of FLOPS and model parameters is of paramount importance to obtain efficient deployable models, especially for resource constrained hardware. Yet, not much research has happened in this direction especially in the context of music emotion recognition. As part of the MediaEval 2020: Emotions and Themes in Music challenge, we (team name: AUGment), propose novel integration of attention based techniques for the task of emotion/mood recognition in music. We demonstrate that using stand-alone self-attention in the later layers of a VGG-ish network, matches the baseline PR-AUC with 11 % fewer FLOPS and 22 % fewer parameters. Further, utilising the learnable Attentionbased Rectified Linear Unit (AReLU) activation helps to achieve better performance than the baseline. As an additional gain, a late fusion of these two models with the baseline also improved the PR-AUC and ROC-AUC by 1 %.
AB - Emotion and theme recognition in music plays a vital role in music information retrieval and recommendation systems. Deep learning based techniques have shown great promise in this regard. Realising optimal network configurations with least number of FLOPS and model parameters is of paramount importance to obtain efficient deployable models, especially for resource constrained hardware. Yet, not much research has happened in this direction especially in the context of music emotion recognition. As part of the MediaEval 2020: Emotions and Themes in Music challenge, we (team name: AUGment), propose novel integration of attention based techniques for the task of emotion/mood recognition in music. We demonstrate that using stand-alone self-attention in the later layers of a VGG-ish network, matches the baseline PR-AUC with 11 % fewer FLOPS and 22 % fewer parameters. Further, utilising the learnable Attentionbased Rectified Linear Unit (AReLU) activation helps to achieve better performance than the baseline. As an additional gain, a late fusion of these two models with the baseline also improved the PR-AUC and ROC-AUC by 1 %.
UR - http://www.scopus.com/inward/record.url?scp=85108078535&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85108078535
SN - 1613-0073
VL - 2882
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020
Y2 - 14 December 2020 through 15 December 2020
ER -