TY - GEN
T1 - Large-scale audio feature extraction and SVM for acoustic scene classification
AU - Geiger, Jurgen T.
AU - Schuller, Bjorn
AU - Rigoll, Gerhard
PY - 2013
Y1 - 2013
N2 - This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral, cepstral, energy and voicing-related audio features are extracted. Using a sliding window approach, classification is performed on short windows. SVM are used to classify these short segments, and a majority voting scheme is employed to get a decision for longer recordings. On the official development set of the challenge, an accuracy of 73 % is achieved. SVM are compared with a nearest neighbour classifier and an approach called Latent Perceptual Indexing, whereby SVM achieve the best results. A feature analysis using the t-statistic shows that mainly Mel spectra are the most relevant features.
AB - This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral, cepstral, energy and voicing-related audio features are extracted. Using a sliding window approach, classification is performed on short windows. SVM are used to classify these short segments, and a majority voting scheme is employed to get a decision for longer recordings. On the official development set of the challenge, an accuracy of 73 % is achieved. SVM are compared with a nearest neighbour classifier and an approach called Latent Perceptual Indexing, whereby SVM achieve the best results. A feature analysis using the t-statistic shows that mainly Mel spectra are the most relevant features.
KW - Computational auditory scene analysis
KW - acoustic scene recognition
KW - feature extraction
UR - http://www.scopus.com/inward/record.url?scp=84893564246&partnerID=8YFLogxK
U2 - 10.1109/WASPAA.2013.6701857
DO - 10.1109/WASPAA.2013.6701857
M3 - Conference contribution
AN - SCOPUS:84893564246
SN - 9781479909728
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
BT - 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
T2 - 2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
Y2 - 20 October 2013 through 23 October 2013
ER -