Spectral and cepstral audio noise reduction techniques in speech emotion recognition

Jouni Pohjalainen, Fabien Ringeval, Zixing Zhang, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.

Original languageEnglish
Title of host publicationMM 2016 - Proceedings of the 2016 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages670-674
Number of pages5
ISBN (Electronic)9781450336031
DOIs
StatePublished - 1 Oct 2016
Externally publishedYes
Event24th ACM Multimedia Conference, MM 2016 - Amsterdam, United Kingdom
Duration: 15 Oct 201619 Oct 2016

Publication series

NameMM 2016 - Proceedings of the 2016 ACM Multimedia Conference

Conference

Conference24th ACM Multimedia Conference, MM 2016
Country/TerritoryUnited Kingdom
CityAmsterdam
Period15/10/1619/10/16

Keywords

  • Denoising
  • Noise reduction
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'Spectral and cepstral audio noise reduction techniques in speech emotion recognition'. Together they form a unique fingerprint.

Cite this