Emotion Recognition in Public Speaking Scenarios Utilising An LSTM-RNN Approach with Attention

Alice Baird, Shahin Amiriparian, Manuel Milling, Bjorn W. Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Speaking in public can be a cause of fear for many people. Research suggests that there are physical markers such as an increased heart rate and vocal tremolo that indicate an individual's state of wellbeing during a public speech. In this study, we explore the advantages of speech-based features for continuous recognition of the emotional dimensions of arousal and valence during a public speaking scenario. Furthermore, we explore biological signal fusion, and perform cross-language (German and English) analysis by training language-independent models and testing them on speech from various native and non-native speaker groupings. For the emotion recognition task itself, we utilise a Long Short-Term Memory - Recurrent Neural Network (LSTM-RNN) architecture with a self-attention layer. When utilising audio-only features and testing with non-native German's speaking German we achieve at best a concordance correlation coefficient (CCC) of 0.640 and 0.491 for arousal and valence, respectively - demonstrating a strong effect for this task from non-native speakers, as well as promise for the suitability of deep learning for continuous emotion recognition in the context of public speaking.

Original languageEnglish
Title of host publication2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages397-402
Number of pages6
ISBN (Electronic)9781728170664
DOIs
StatePublished - 19 Jan 2021
Externally publishedYes
Event2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
Duration: 19 Jan 202122 Jan 2021

Publication series

Name2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
Country/TerritoryChina
CityVirtual, Shenzhen
Period19/01/2122/01/21

Keywords

  • affective computing
  • long short-term memory
  • public speaking
  • recurrent neural networks

Fingerprint

Dive into the research topics of 'Emotion Recognition in Public Speaking Scenarios Utilising An LSTM-RNN Approach with Attention'. Together they form a unique fingerprint.

Cite this