TY - GEN
T1 - Compact Bilinear Deep Features For Environmental Sound Recognition
AU - Demir, Fatih
AU - Sengur, Abdulkadir
AU - Lu, Hao
AU - Amiriparian, Shahin
AU - Cummins, Nicholas
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/21
Y1 - 2019/1/21
N2 - Environmental sound recognition (ESR) has extensive various civilian and military applications. Existing ESR methods generally tackle this problem by employing various signal processing and machine learning methods. Herein, an ESR paradigm based on feature extraction from pre-trained deep convolutional neural networks (CNN), the derivation of higher-order statistics by compact bilinear pooling and normalisation. In particular, we consider two deep ImageNet architectures for deep feature extraction, and the Random Maclaurin (RM) to produce the compact bilinear features. A support vector machine (SVM) with homogeneous mapping is used in the classification stage. Two publicly available environmental sound datasets are used to verify the efficacy of the approach namely, ESC-50 and ESC-10. We compare the proposed method with various previous state-of-the-art methods. Presented results indicate the suitability of the higher-order statistics of Deep Spectrum representations for ESR classification tasks.
AB - Environmental sound recognition (ESR) has extensive various civilian and military applications. Existing ESR methods generally tackle this problem by employing various signal processing and machine learning methods. Herein, an ESR paradigm based on feature extraction from pre-trained deep convolutional neural networks (CNN), the derivation of higher-order statistics by compact bilinear pooling and normalisation. In particular, we consider two deep ImageNet architectures for deep feature extraction, and the Random Maclaurin (RM) to produce the compact bilinear features. A support vector machine (SVM) with homogeneous mapping is used in the classification stage. Two publicly available environmental sound datasets are used to verify the efficacy of the approach namely, ESC-50 and ESC-10. We compare the proposed method with various previous state-of-the-art methods. Presented results indicate the suitability of the higher-order statistics of Deep Spectrum representations for ESR classification tasks.
KW - Environmental sound classification
KW - compact bilinear pooling
KW - convolutional neural networks
KW - deep spectrum features
UR - http://www.scopus.com/inward/record.url?scp=85062553395&partnerID=8YFLogxK
U2 - 10.1109/IDAP.2018.8620779
DO - 10.1109/IDAP.2018.8620779
M3 - Conference contribution
AN - SCOPUS:85062553395
T3 - 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018
BT - 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018
Y2 - 28 September 2018 through 30 September 2018
ER -