Non-negative matrix factorization as noise-robust feature extractor for speech recognition

Björn Schuller, Felix Weninger, Martin Wöllmer, Yang Sun, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

40 Scopus citations

Abstract

We introduce a novel approach for noise-robust feature extraction in speech recognition, based on non-negative matrix factorization (NMF). While NMF has previously been used for speech denoising and speaker separation, we directly extract time-varying features from the NMF output. To this end we extend basic unsupervised NMF to a hybrid supervised/unsupervised algorithm. We present a Dynamic Bayesian Network (DBN) architecture that can exploit these features in a Tandem manner together with the maximum likelihood phoneme estimate of a bidirectional long short-term memory (BLSTM) re-current neural network. We show that addition of NMF features to spelling recognition systems can increase word accuracy by up to 7% absolute in a noisy car environment.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4562-4565
Number of pages4
ISBN (Print)9781424442966
DOIs
StatePublished - 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 14 Mar 201019 Mar 2010

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/TerritoryUnited States
CityDallas, TX
Period14/03/1019/03/10

Keywords

  • Dynamic bayesian networks
  • Long short-term memory
  • Noise robustness
  • Non-negative matrix factorization
  • Speech recognition

Fingerprint

Dive into the research topics of 'Non-negative matrix factorization as noise-robust feature extractor for speech recognition'. Together they form a unique fingerprint.

Cite this