Hierarchical neural networks and enhanced class posteriors for social signal classification

Raymond Brueckner, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

With the impressive advances of deep learning in recent years the interest in neural networks has resurged in the fields of automatic speech recognition and emotion recognition. In this paper we apply neural networks to address speaker-independent detection and classification of laughter and filler vocalizations in speech. We first explore modeling class posteriors with standard neural networks and deep stacked autoencoders. Then, we adopt a hierarchical neural architecture to compute enhanced class posteriors and demonstrate that this approach introduces significant and consistent improvements on the Social Signals Sub-Challenge of the Interspeech 2013 Computational Paralinguistics Challenge (ComParE). On this task we achieve a value of 92.4% of the unweighted average area-under-the-curve, which is the official competition measure, on the test set. This constitutes an improvement of 9.1% over the baseline and is the best result obtained so far on this task.

Original languageEnglish
Title of host publication2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings
Pages362-367
Number of pages6
DOIs
StatePublished - 2013
Event2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Olomouc, Czech Republic
Duration: 8 Dec 201313 Dec 2013

Publication series

Name2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

Conference

Conference2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013
Country/TerritoryCzech Republic
CityOlomouc
Period8/12/1313/12/13

Keywords

  • computational paralinguistics challenge
  • deep autoencoder networks
  • enhanced posteriors
  • hierarchical neural networks

Fingerprint

Dive into the research topics of 'Hierarchical neural networks and enhanced class posteriors for social signal classification'. Together they form a unique fingerprint.

Cite this