Hierarchical Component-attention Based Speaker Turn Embedding for Emotion Recognition

Shuo Liu, Jinlong Jiao, Ziping Zhao, Judith DIneley, Nicholas Cummins, Bjorn Schuller

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

5 Zitate (Scopus)

Abstract

Traditional discrete-time Speech Emotion Recognition (SER) modelling techniques typically assume that an entire speaker chunk or turn is indicative of its corresponding label. An alternative approach is to assume emotional saliency varies over the course of a speaker turn and use modelling techniques capable of identifying and utilising the most emotionally salient segments, such as those with higher emotional intensity. This strategy has the potential to improve the accuracy of SER systems. Towards this goal, we developed a novel hierarchical recurrent neural network model that produces turn level embeddings for SER. Specifically, we apply two levels of attention to learn to identify salient emotional words in a turn as well as the more informative frames within these words. In a set of experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database, we demonstrate that component-attention is more effective within our hierarchical framework than both standard soft-attention and conventional local-attention. Our best network, a hierarchical component-attention network with an attention scope of seven, achieved an Unweighted Average Recall (UAR) of 65.0 % and a Weighted Average Recall (WAR) of 66.1 %, outperforming other baseline attention approaches on the IEMOCAP database.

OriginalspracheEnglisch
Titel2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
ISBN (elektronisch)9781728169262
DOIs
PublikationsstatusVeröffentlicht - Juli 2020
Extern publiziertJa
Veranstaltung2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, Großbritannien/Vereinigtes Königreich
Dauer: 19 Juli 202024 Juli 2020

Publikationsreihe

NameProceedings of the International Joint Conference on Neural Networks

Konferenz

Konferenz2020 International Joint Conference on Neural Networks, IJCNN 2020
Land/GebietGroßbritannien/Vereinigtes Königreich
OrtVirtual, Glasgow
Zeitraum19/07/2024/07/20

Fingerprint

Untersuchen Sie die Forschungsthemen von „Hierarchical Component-attention Based Speaker Turn Embedding for Emotion Recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren