TY - GEN
T1 - Discrimination of speech and monophonic singing in continuous audio streams applying multi-layer support vector machines
AU - Schuller, Björn
AU - Rigoll, Gerhard
AU - Lang, Manfred
PY - 2004
Y1 - 2004
N2 - In this paper we present a novel approach to the discrimination of speech and monophonic singing for the use in Music Information Retrieval applications. A working prototype is introduced applying Multi-Layer Support Vector Machines for the discrimination, and static high-level features derived of the pitch and energy contours of an acoustic signal. The feature set for the discrimination is presented and ranked according to a Linear Discriminant Analysis. For the automatic segmentation within an input signal stream a further feature set is used for the discrimination of signal and noise. A corpus for training and evaluation comprising speech and monophonic singing data of nine performers is described in detail. The data has been labeled according to the judgments of another set of probands. A recognition rate of correct assignments of 99.2 % could be reached, and demonstrates the high performance of the proposed methods.
AB - In this paper we present a novel approach to the discrimination of speech and monophonic singing for the use in Music Information Retrieval applications. A working prototype is introduced applying Multi-Layer Support Vector Machines for the discrimination, and static high-level features derived of the pitch and energy contours of an acoustic signal. The feature set for the discrimination is presented and ranked according to a Linear Discriminant Analysis. For the automatic segmentation within an input signal stream a further feature set is used for the discrimination of signal and noise. A corpus for training and evaluation comprising speech and monophonic singing data of nine performers is described in detail. The data has been labeled according to the judgments of another set of probands. A recognition rate of correct assignments of 99.2 % could be reached, and demonstrates the high performance of the proposed methods.
UR - http://www.scopus.com/inward/record.url?scp=11244352183&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:11244352183
SN - 0780386035
SN - 9780780386037
T3 - 2004 IEEE International Conference on Multimedia and Expo (ICME)
SP - 1655
EP - 1658
BT - 2004 IEEE International Conference on Multimedia and Expo (ICME)
T2 - 2004 IEEE International Conference on Multimedia and Expo (ICME)
Y2 - 27 June 2004 through 30 June 2004
ER -