Combining monaural source separation with long short-term memory for increased robustness in vocalist gender recognition

Felix Weninger, Jean Louis Durrieu, Florian Eyben, Gaël Richard, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

We present a novel and unique combination of algorithms to detect the gender of the leading vocalist in recorded popular music. Building on our previous successful approach that enhanced the harmonic parts by means of Non-Negative Matrix Factorization (NMF) for increased accuracy, we integrate on the one hand a new source separation algorithm specifically tailored to extracting the leading voice from monaural recordings. On the other hand, we introduce Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) as context-sensitive classifiers for this scenario, which have lately led to great success in Music Information Retrieval tasks. Through a combination of leading voice separation and BLSTM networks, as opposed to a baseline approach using Hidden Naive Bayes on the original recordings, the accuracy of simultaneous detection of vocal presence and vocalist gender on beat level is improved by up to 10% absolute. Furthermore, using this technique we achieve 91.6% accuracy in determining the gender of the predominant vocalist on song level, which is 4% absolute above our previous best result.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages2196-2199
Number of pages4
DOIs
StatePublished - 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 22 May 201127 May 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Country/TerritoryCzech Republic
CityPrague
Period22/05/1127/05/11

Keywords

  • Long Short-Term Memory
  • Music Information Retrieval
  • Non-Negative Matrix Factorization

Fingerprint

Dive into the research topics of 'Combining monaural source separation with long short-term memory for increased robustness in vocalist gender recognition'. Together they form a unique fingerprint.

Cite this