Score-informed leading voice separation from monaural audio

Cyril Joder, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Separating the leading voice from a musical recording seems to be natural to the human ear. Yet, it remains a difficult problem for automatic systems, in particular in the blind case, where no information is known about the signal. However, in the case where a musical score is available, one can take advantage of this additional information. In this paper, we present a novel application of this idea for leading voice separation exploiting a temporally-aligned MIDI Score. The model used is based on Nonnegative Matrix Factorization (NMF), whose solo part is represented by a source-filter model. We exploit the score information by constraining the source activations to conform to the aligned MIDI file. Experiments run on a database of real popular songs show that the use of these constraints can significantly improve the separation quality, in terms of both signal-based and perceptual evaluation metrics.

Original languageEnglish
Title of host publicationProceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012
Pages277-282
Number of pages6
StatePublished - 2012
Event13th International Society for Music Information Retrieval Conference, ISMIR 2012 - Porto, Portugal
Duration: 8 Oct 201212 Oct 2012

Publication series

NameProceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012

Conference

Conference13th International Society for Music Information Retrieval Conference, ISMIR 2012
Country/TerritoryPortugal
CityPorto
Period8/10/1212/10/12

Fingerprint

Dive into the research topics of 'Score-informed leading voice separation from monaural audio'. Together they form a unique fingerprint.

Cite this