Investigating NMF speech enhancement for neural network based acoustic models

Jürgen T. Geiger, Jort F. Gemmeke, Björn Schuller, Gerhard Rigoll

Research output: Contribution to journalConference articlepeer-review

13 Scopus citations

Abstract

In the light of the improvements that were made in the last years with neural network-based acoustic models, it is an interesting question whether these models are also suited for noise-robust recognition. This has not yet been fully explored, although first experiments confirm this question. Furthermore, preprocessing techniques that improve the robustness should be re-evaluated with these new models. In this work, we present experimental results to address these questions. Acoustic models based on Gaussian mixture models (GMMs), deep neural networks (DNNs), and long short-term memory (LSTM) recurrent neural networks (which have an improved ability to exploit context) are evaluated for their robustness after clean or multi-condition training. In addition, the influence of non-negative matrix factorization (NMF) for speech enhancement is investigated. Experiments are performed with the Aurora-4 database and the results show that DNNs perform slightly better than LSTMs and, as expected, both beat GMMs. Furthermore, speech enhancement is capable of improving the DNN result.

Keywords

  • Long short-term memory
  • Robust speech recognition
  • Speech enhancement

Fingerprint

Dive into the research topics of 'Investigating NMF speech enhancement for neural network based acoustic models'. Together they form a unique fingerprint.

Cite this