TY - GEN
T1 - Combining monaural source separation with long short-term memory for increased robustness in vocalist gender recognition
AU - Weninger, Felix
AU - Durrieu, Jean Louis
AU - Eyben, Florian
AU - Richard, Gaël
AU - Schuller, Björn
PY - 2011
Y1 - 2011
N2 - We present a novel and unique combination of algorithms to detect the gender of the leading vocalist in recorded popular music. Building on our previous successful approach that enhanced the harmonic parts by means of Non-Negative Matrix Factorization (NMF) for increased accuracy, we integrate on the one hand a new source separation algorithm specifically tailored to extracting the leading voice from monaural recordings. On the other hand, we introduce Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) as context-sensitive classifiers for this scenario, which have lately led to great success in Music Information Retrieval tasks. Through a combination of leading voice separation and BLSTM networks, as opposed to a baseline approach using Hidden Naive Bayes on the original recordings, the accuracy of simultaneous detection of vocal presence and vocalist gender on beat level is improved by up to 10% absolute. Furthermore, using this technique we achieve 91.6% accuracy in determining the gender of the predominant vocalist on song level, which is 4% absolute above our previous best result.
AB - We present a novel and unique combination of algorithms to detect the gender of the leading vocalist in recorded popular music. Building on our previous successful approach that enhanced the harmonic parts by means of Non-Negative Matrix Factorization (NMF) for increased accuracy, we integrate on the one hand a new source separation algorithm specifically tailored to extracting the leading voice from monaural recordings. On the other hand, we introduce Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) as context-sensitive classifiers for this scenario, which have lately led to great success in Music Information Retrieval tasks. Through a combination of leading voice separation and BLSTM networks, as opposed to a baseline approach using Hidden Naive Bayes on the original recordings, the accuracy of simultaneous detection of vocal presence and vocalist gender on beat level is improved by up to 10% absolute. Furthermore, using this technique we achieve 91.6% accuracy in determining the gender of the predominant vocalist on song level, which is 4% absolute above our previous best result.
KW - Long Short-Term Memory
KW - Music Information Retrieval
KW - Non-Negative Matrix Factorization
UR - http://www.scopus.com/inward/record.url?scp=80051625764&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5946764
DO - 10.1109/ICASSP.2011.5946764
M3 - Conference contribution
AN - SCOPUS:80051625764
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2196
EP - 2199
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -