TY - JOUR
T1 - Towards speech robustness for acoustic scene classification
AU - Liu, Shuo
AU - Triantafyllopoulos, Andreas
AU - Ren, Zhao
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2020 ISCA
PY - 2020
Y1 - 2020
N2 - This work discusses the impact of human voice on acoustic scene classification (ASC) systems. Typically, such systems are trained and evaluated on data sets lacking human speech. We show experimentally that the addition of speech can be detrimental to system performance. Furthermore, we propose two alternative solutions to mitigate that effect in the context of deep neural networks (DNNs). We first utilise data augmentation to make the algorithm robust against the presence of human speech in the data. We also introduce a voice-suppression algorithm that removes human speech from audio recordings, and test the DNN classifier on those denoised samples. Experimental results show that both approaches reduce the negative effects of human voice in ASC systems. Compared to using data augmentation, applying voice suppression achieved better classification accuracy and managed to perform more stably for different speech intensity.
AB - This work discusses the impact of human voice on acoustic scene classification (ASC) systems. Typically, such systems are trained and evaluated on data sets lacking human speech. We show experimentally that the addition of speech can be detrimental to system performance. Furthermore, we propose two alternative solutions to mitigate that effect in the context of deep neural networks (DNNs). We first utilise data augmentation to make the algorithm robust against the presence of human speech in the data. We also introduce a voice-suppression algorithm that removes human speech from audio recordings, and test the DNN classifier on those denoised samples. Experimental results show that both approaches reduce the negative effects of human voice in ASC systems. Compared to using data augmentation, applying voice suppression achieved better classification accuracy and managed to perform more stably for different speech intensity.
KW - Acoustic scene classification
KW - Computational auditory scene analysis
KW - Speech robustness
KW - Voice suppression
UR - http://www.scopus.com/inward/record.url?scp=85098147335&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2365
DO - 10.21437/Interspeech.2020-2365
M3 - Conference article
AN - SCOPUS:85098147335
SN - 2308-457X
VL - 2020-October
SP - 3087
EP - 3091
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -