TY - GEN
T1 - Supervised and semi-supervised suppression of background music in monaural speech recordings
AU - Weninger, Felix
AU - Feliu, Jordi
AU - Schuller, Bjorn
PY - 2012
Y1 - 2012
N2 - In this paper, we propose a semi-supervised algorithm based on sparse non-negative matrix factorization (NMF) to improve separation of speech from background music in monaural signals. In our approach, fixed speech basis vectors are obtained from training data whereas music bases are estimated on-the-fly to cope with spectral variability while preserving small NMF dimensionality for decreased computation effort. In a large-scale experimental evaluation with 168 speakers from the TIMIT database, we compare the semi-supervised method to supervised NMF with an explicit background music model. Our results reveal that the semi-supervised method outperforms supervised NMF at low speech-to-music ratios, and that sparsity constraints on the music spectra to enforce harmonicity can improve separation performance.
AB - In this paper, we propose a semi-supervised algorithm based on sparse non-negative matrix factorization (NMF) to improve separation of speech from background music in monaural signals. In our approach, fixed speech basis vectors are obtained from training data whereas music bases are estimated on-the-fly to cope with spectral variability while preserving small NMF dimensionality for decreased computation effort. In a large-scale experimental evaluation with 168 speakers from the TIMIT database, we compare the semi-supervised method to supervised NMF with an explicit background music model. Our results reveal that the semi-supervised method outperforms supervised NMF at low speech-to-music ratios, and that sparsity constraints on the music spectra to enforce harmonicity can improve separation performance.
KW - non-negative matrix factorization
KW - sparse coding
KW - speech enhancement
KW - supervised source separation
UR - http://www.scopus.com/inward/record.url?scp=84867622240&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2012.6287817
DO - 10.1109/ICASSP.2012.6287817
M3 - Conference contribution
AN - SCOPUS:84867622240
SN - 9781467300469
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 61
EP - 64
BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Y2 - 25 March 2012 through 30 March 2012
ER -