Towards speech robustness for acoustic scene classification

Shuo Liu, Andreas Triantafyllopoulos, Zhao Ren, Björn W. Schuller

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

5 Zitate (Scopus)

Abstract

This work discusses the impact of human voice on acoustic scene classification (ASC) systems. Typically, such systems are trained and evaluated on data sets lacking human speech. We show experimentally that the addition of speech can be detrimental to system performance. Furthermore, we propose two alternative solutions to mitigate that effect in the context of deep neural networks (DNNs). We first utilise data augmentation to make the algorithm robust against the presence of human speech in the data. We also introduce a voice-suppression algorithm that removes human speech from audio recordings, and test the DNN classifier on those denoised samples. Experimental results show that both approaches reduce the negative effects of human voice in ASC systems. Compared to using data augmentation, applying voice suppression achieved better classification accuracy and managed to perform more stably for different speech intensity.

OriginalspracheEnglisch
Seiten (von - bis)3087-3091
Seitenumfang5
FachzeitschriftProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Jahrgang2020-October
DOIs
PublikationsstatusVeröffentlicht - 2020
Extern publiziertJa
Veranstaltung21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Dauer: 25 Okt. 202029 Okt. 2020

Fingerprint

Untersuchen Sie die Forschungsthemen von „Towards speech robustness for acoustic scene classification“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren