Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?

Felix Weninger, Martin Wöllmer, Jürgen Geiger, Björn Schuller, Jort F. Gemmeke, Antti Hurmalainen, Tuomas Virtanen, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

34 Zitate (Scopus)

Abstract

This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant environments, as featured in the 2011 PASCAL CHiME Challenge. We integrate word predictions by a bidirectional Long Short-Term Memory recurrent neural network and non-negative sparse classification (NSC) into a multi-stream Hidden Markov Model using convolutive non-negative matrix factorization (NMF) for speech enhancement. Our results suggest that NMF-based enhancement and NSC are complementary despite their overlap in methodology, reaching up to 91.9% average keyword accuracy on the Challenge test set at signal-to-noise ratios from -6 to 9 dB-the best result reported so far on these data.

OriginalspracheEnglisch
Titel2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Seiten4681-4684
Seitenumfang4
DOIs
PublikationsstatusVeröffentlicht - 2012
Veranstaltung2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Dauer: 25 März 201230 März 2012

Publikationsreihe

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Konferenz

Konferenz2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Land/GebietJapan
OrtKyoto
Zeitraum25/03/1230/03/12

Fingerprint

Untersuchen Sie die Forschungsthemen von „Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren