Large-scale audio feature extraction and SVM for acoustic scene classification

Jurgen T. Geiger, Bjorn Schuller, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

94 Scopus citations

Abstract

This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral, cepstral, energy and voicing-related audio features are extracted. Using a sliding window approach, classification is performed on short windows. SVM are used to classify these short segments, and a majority voting scheme is employed to get a decision for longer recordings. On the official development set of the challenge, an accuracy of 73 % is achieved. SVM are compared with a nearest neighbour classifier and an approach called Latent Perceptual Indexing, whereby SVM achieve the best results. A feature analysis using the t-statistic shows that mainly Mel spectra are the most relevant features.

Original languageEnglish
Title of host publication2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
DOIs
StatePublished - 2013
Event2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013 - New Paltz, NY, United States
Duration: 20 Oct 201323 Oct 2013

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Conference

Conference2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
Country/TerritoryUnited States
CityNew Paltz, NY
Period20/10/1323/10/13

Keywords

  • Computational auditory scene analysis
  • acoustic scene recognition
  • feature extraction

Fingerprint

Dive into the research topics of 'Large-scale audio feature extraction and SVM for acoustic scene classification'. Together they form a unique fingerprint.

Cite this