Deep Scalogram Representations for Acoustic Scene Classification

Zhao Ren, Kun Qian, Yebin Wang, Zixing Zhang, Vedhas Pandit, Alice Baird, Bjorn Schuller

Publikation: Beitrag in FachzeitschriftArtikelBegutachtung

98 Zitate (Scopus)

Abstract

Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-Trained convolutional neural networks; thirdly, the features extracted from a subsequent fully connected layer are fed into U+0028 bidirectional U+0029 gated recurrent neural networks, which are followed by a single highway layer and a softmax layer; finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events U+0028 DCASE U+0029. On the evaluation set, an accuracy of 64.0 U+0025 from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 U+0025 baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy, when fusing with a spectrogram-based system.

OriginalspracheEnglisch
Seiten (von - bis)662-669
Seitenumfang8
FachzeitschriftIEEE/CAA Journal of Automatica Sinica
Jahrgang5
Ausgabenummer3
DOIs
PublikationsstatusVeröffentlicht - Mai 2018
Extern publiziertJa

Fingerprint

Untersuchen Sie die Forschungsthemen von „Deep Scalogram Representations for Acoustic Scene Classification“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren