Audiovisual vocal outburst classification in noisy acoustic conditions

Florian Eyben, Stavros Petridis, Björn Schuller, Maja Pantic

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

In this study, we investigate an audiovisual approach for classification of vocal outbursts (non-linguistic vocalisations) in noisy conditions using Long Short-Term Memory (LSTM) Recurrent Neural Networks and Support Vector Machines. Fusion of geometric shape features and acoustic low-level descriptors is performed on the feature level. Three different types of acoustic noise are considered: babble, office and street noise. Experiments are conducted on every noise type to asses the benefit of the fusion in each case. As database for evaluations serves the INTERSPEECH 2010 Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. The results show that even when training is performed on noise corrupted audio which matches the test conditions the addition of visual features is still beneficial.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages5097-5100
Number of pages4
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: 25 Mar 201230 Mar 2012

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Country/TerritoryJapan
CityKyoto
Period25/03/1230/03/12

Keywords

  • Audiovisual Processing
  • Laughter
  • Long Short-Term Memory
  • Non-linguistic Vocalisations

Fingerprint

Dive into the research topics of 'Audiovisual vocal outburst classification in noisy acoustic conditions'. Together they form a unique fingerprint.

Cite this