TY - JOUR
T1 - A novel attention-based gated recurrent unit and its efficacy in speech emotion recognition
AU - Rajamani, Srividya Tirunellai
AU - Rajamani, Kumar T.
AU - Mallol-Ragolta, Adria
AU - Liu, Shuo
AU - Schuller, Björn
N1 - Publisher Copyright:
©2021 IEEE
PY - 2021
Y1 - 2021
N2 - Notwithstanding the significant advancements in the field of deep learning, the basic long short-term memory (LSTM) or Gated Recurrent Unit (GRU) units have largely remained unchanged and unexplored. There are several possibilities in advancing the state-of-art by rightly adapting and enhancing the various elements of these units. Activation functions are one such key element. In this work, we explore using diverse activation functions within GRU and bi-directional GRU (BiGRU) cells in the context of speech emotion recognition (SER). We also propose a novel Attention ReLU GRU (AR-GRU) that employs attention-based Rectified Linear Unit (AReLU) activation within GRU and BiGRU cells. We demonstrate the effectiveness of AR-GRU on one exemplary application using the recently proposed network for SER namely Interaction-Aware Attention Network (IAAN). Our proposed method utilising AR-GRU within this network yields significant performance gain and achieves an unweighted accuracy of 68.3% (2% over the baseline) and weighted accuracy of 66.9 % (2.2 % absolute over the baseline) in four class emotion recognition on the IEMOCAP database.
AB - Notwithstanding the significant advancements in the field of deep learning, the basic long short-term memory (LSTM) or Gated Recurrent Unit (GRU) units have largely remained unchanged and unexplored. There are several possibilities in advancing the state-of-art by rightly adapting and enhancing the various elements of these units. Activation functions are one such key element. In this work, we explore using diverse activation functions within GRU and bi-directional GRU (BiGRU) cells in the context of speech emotion recognition (SER). We also propose a novel Attention ReLU GRU (AR-GRU) that employs attention-based Rectified Linear Unit (AReLU) activation within GRU and BiGRU cells. We demonstrate the effectiveness of AR-GRU on one exemplary application using the recently proposed network for SER namely Interaction-Aware Attention Network (IAAN). Our proposed method utilising AR-GRU within this network yields significant performance gain and achieves an unweighted accuracy of 68.3% (2% over the baseline) and weighted accuracy of 66.9 % (2.2 % absolute over the baseline) in four class emotion recognition on the IEMOCAP database.
KW - AReLU
KW - Attention mechanism
KW - Gated recurrent unit
KW - ReLU
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85115166426&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9414489
DO - 10.1109/ICASSP39728.2021.9414489
M3 - Conference article
AN - SCOPUS:85115166426
SN - 1520-6149
VL - 2021-June
SP - 6294
EP - 6298
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -