Towards speech robustness for acoustic scene classification

Shuo Liu, Andreas Triantafyllopoulos, Zhao Ren, Björn W. Schuller

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations

Abstract

This work discusses the impact of human voice on acoustic scene classification (ASC) systems. Typically, such systems are trained and evaluated on data sets lacking human speech. We show experimentally that the addition of speech can be detrimental to system performance. Furthermore, we propose two alternative solutions to mitigate that effect in the context of deep neural networks (DNNs). We first utilise data augmentation to make the algorithm robust against the presence of human speech in the data. We also introduce a voice-suppression algorithm that removes human speech from audio recordings, and test the DNN classifier on those denoised samples. Experimental results show that both approaches reduce the negative effects of human voice in ASC systems. Compared to using data augmentation, applying voice suppression achieved better classification accuracy and managed to perform more stably for different speech intensity.

Original languageEnglish
Pages (from-to)3087-3091
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
DOIs
StatePublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Keywords

  • Acoustic scene classification
  • Computational auditory scene analysis
  • Speech robustness
  • Voice suppression

Fingerprint

Dive into the research topics of 'Towards speech robustness for acoustic scene classification'. Together they form a unique fingerprint.

Cite this