TY - GEN
T1 - Feature selection and stacking for robust discrimination of speech, monophonic singing, and polyphonic music
AU - Schuller, Bj̈rn
AU - Schmitt, Bernardo José Brüning
AU - Arsić, Dejan
AU - Reiter, Stephan
AU - Lang, Manfred
AU - Rigoll, Gerhard
PY - 2005
Y1 - 2005
N2 - In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are Kernel Machines, Decision Trees, and Bayesian Classifiers. Moreover we improve single classifier performance by Bagging and Boosting, and finally combine strengths of classifiers by StackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 Music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working real-time capable implementation stress the practicability of the proposed novel ideas.
AB - In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are Kernel Machines, Decision Trees, and Bayesian Classifiers. Moreover we improve single classifier performance by Bagging and Boosting, and finally combine strengths of classifiers by StackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 Music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working real-time capable implementation stress the practicability of the proposed novel ideas.
UR - http://www.scopus.com/inward/record.url?scp=33750573041&partnerID=8YFLogxK
U2 - 10.1109/ICME.2005.1521554
DO - 10.1109/ICME.2005.1521554
M3 - Conference contribution
AN - SCOPUS:33750573041
SN - 0780393325
SN - 9780780393325
T3 - IEEE International Conference on Multimedia and Expo, ICME 2005
SP - 840
EP - 843
BT - IEEE International Conference on Multimedia and Expo, ICME 2005
T2 - IEEE International Conference on Multimedia and Expo, ICME 2005
Y2 - 6 July 2005 through 8 July 2005
ER -