TY - GEN
T1 - Non-negative matrix factorization for highly noise-robust ASR
T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
AU - Weninger, Felix
AU - Wöllmer, Martin
AU - Geiger, Jürgen
AU - Schuller, Björn
AU - Gemmeke, Jort F.
AU - Hurmalainen, Antti
AU - Virtanen, Tuomas
AU - Rigoll, Gerhard
PY - 2012
Y1 - 2012
N2 - This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant environments, as featured in the 2011 PASCAL CHiME Challenge. We integrate word predictions by a bidirectional Long Short-Term Memory recurrent neural network and non-negative sparse classification (NSC) into a multi-stream Hidden Markov Model using convolutive non-negative matrix factorization (NMF) for speech enhancement. Our results suggest that NMF-based enhancement and NSC are complementary despite their overlap in methodology, reaching up to 91.9% average keyword accuracy on the Challenge test set at signal-to-noise ratios from -6 to 9 dB-the best result reported so far on these data.
AB - This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant environments, as featured in the 2011 PASCAL CHiME Challenge. We integrate word predictions by a bidirectional Long Short-Term Memory recurrent neural network and non-negative sparse classification (NSC) into a multi-stream Hidden Markov Model using convolutive non-negative matrix factorization (NMF) for speech enhancement. Our results suggest that NMF-based enhancement and NSC are complementary despite their overlap in methodology, reaching up to 91.9% average keyword accuracy on the Challenge test set at signal-to-noise ratios from -6 to 9 dB-the best result reported so far on these data.
KW - Non-Negative Matrix Factorization
KW - Tandem Speech Recognition
UR - http://www.scopus.com/inward/record.url?scp=84867600087&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2012.6288963
DO - 10.1109/ICASSP.2012.6288963
M3 - Conference contribution
AN - SCOPUS:84867600087
SN - 9781467300469
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4681
EP - 4684
BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Y2 - 25 March 2012 through 30 March 2012
ER -