Feature selection and stacking for robust discrimination of speech, monophonic singing, and polyphonic music

Bj̈rn Schuller, Bernardo José Brüning Schmitt, Dejan Arsić, Stephan Reiter, Manfred Lang, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are Kernel Machines, Decision Trees, and Bayesian Classifiers. Moreover we improve single classifier performance by Bagging and Boosting, and finally combine strengths of classifiers by StackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 Music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working real-time capable implementation stress the practicability of the proposed novel ideas.

Original languageEnglish
Title of host publicationIEEE International Conference on Multimedia and Expo, ICME 2005
Pages840-843
Number of pages4
DOIs
StatePublished - 2005
EventIEEE International Conference on Multimedia and Expo, ICME 2005 - Amsterdam, Netherlands
Duration: 6 Jul 20058 Jul 2005

Publication series

NameIEEE International Conference on Multimedia and Expo, ICME 2005
Volume2005

Conference

ConferenceIEEE International Conference on Multimedia and Expo, ICME 2005
Country/TerritoryNetherlands
CityAmsterdam
Period6/07/058/07/05

Fingerprint

Dive into the research topics of 'Feature selection and stacking for robust discrimination of speech, monophonic singing, and polyphonic music'. Together they form a unique fingerprint.

Cite this