Group-level Speech Emotion Recognition Utilising Deep Spectrum Features

Sandra Ottl, Shahin Amiriparian, Maurice Gerczuk, Vincent Karas, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

The objectives of this challenge paper are two fold: first, we apply a range of neural network based transfer learning approaches to cope with the data scarcity in the field of speech emotion recognition, and second, we fuse the obtained representations and predictions in a nearly and late fusion strategy to check the complementarity of the applied networks. In particular, we use our Deep Spectrum system to extract deep feature representations from the audio content of the 2020 EmotiW group level emotion prediction challenge data. We evaluate a total of ten ImageNet pre-trained Convolutional Neural Networks, including AlexNet, VGG16, VGG19 and three DenseNet variants as audio feature extractors. We compare their performance to the ComParE feature set used in the challenge baseline, employing simple logistic regression models trained with Stochastic Gradient Descent as classifiers. With the help of late fusion, our approach improves the performance on the test set from 47.88 % to 62.70 % accuracy.

Original languageEnglish
Title of host publicationICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery, Inc
Pages821-826
Number of pages6
ISBN (Electronic)9781450375818
DOIs
StatePublished - 21 Oct 2020
Externally publishedYes
Event22nd ACM International Conference on Multimodal Interaction, ICMI 2020 - Virtual, Online, Netherlands
Duration: 25 Oct 202029 Oct 2020

Publication series

NameICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction

Conference

Conference22nd ACM International Conference on Multimodal Interaction, ICMI 2020
Country/TerritoryNetherlands
CityVirtual, Online
Period25/10/2029/10/20

Keywords

  • deep spectrum
  • early and late fusion
  • emotion recognition
  • emotiw
  • pre-trained cnns

Fingerprint

Dive into the research topics of 'Group-level Speech Emotion Recognition Utilising Deep Spectrum Features'. Together they form a unique fingerprint.

Cite this