TY - GEN
T1 - Bag-of-Deep-Features
T2 - 2018 International Joint Conference on Neural Networks, IJCNN 2018
AU - Amiriparian, Shahin
AU - Gerczuk, Maurice
AU - Ottl, Sandra
AU - Cummins, Nicholas
AU - Pugachevskiy, Sergey
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/10
Y1 - 2018/10/10
N2 - In the era of deep learning, research into the classification of various components of the acoustic environment, especially in-the-wild recordings, is gaining in popularity. This is due in part to the increasing computational capacities and the expanding amount of real-world data available on social multimedia. However, the noisy nature of this data can add an additional complexity to the already complex deep learning systems. Herein, we tackle this issue by quantising deep feature representations of various in-the-wild audio data sets. The aim of this paper is twofold: 1) to assess the feasibility of the proposed feature quantisation task, and 2) to compare the efficacy of various feature spaces extracted from different fully connected deep neural networks to classify six real-world audio corpora. For the classification, we extract two feature sets: I) DEEP SPECTRUM features which are derived from forwarding the visual representations of the audio instances, in particular mel-spectrograms through very deep task-independent pre-trained Convolutional Neural Networks (CNNs), and ii) Bag-of-Deep-Features (BODF) which is the quantisation of the DEEP SPECTRUM features. Using BODF, we show the suitability of quantising the deep representations for noisy in-the-wild audio data. Finally, we analyse the effect of early and late fusion of the CNN features and models on the classification results.
AB - In the era of deep learning, research into the classification of various components of the acoustic environment, especially in-the-wild recordings, is gaining in popularity. This is due in part to the increasing computational capacities and the expanding amount of real-world data available on social multimedia. However, the noisy nature of this data can add an additional complexity to the already complex deep learning systems. Herein, we tackle this issue by quantising deep feature representations of various in-the-wild audio data sets. The aim of this paper is twofold: 1) to assess the feasibility of the proposed feature quantisation task, and 2) to compare the efficacy of various feature spaces extracted from different fully connected deep neural networks to classify six real-world audio corpora. For the classification, we extract two feature sets: I) DEEP SPECTRUM features which are derived from forwarding the visual representations of the audio instances, in particular mel-spectrograms through very deep task-independent pre-trained Convolutional Neural Networks (CNNs), and ii) Bag-of-Deep-Features (BODF) which is the quantisation of the DEEP SPECTRUM features. Using BODF, we show the suitability of quantising the deep representations for noisy in-the-wild audio data. Finally, we analyse the effect of early and late fusion of the CNN features and models on the classification results.
UR - http://www.scopus.com/inward/record.url?scp=85054970030&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2018.8489416
DO - 10.1109/IJCNN.2018.8489416
M3 - Conference contribution
AN - SCOPUS:85054970030
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2018 International Joint Conference on Neural Networks, IJCNN 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 July 2018 through 13 July 2018
ER -