Discriminatively trained recurrent neural networks for single-channel speech separation

Felix Weninger, John R. Hershey, Jonathan Le Roux, Bjorn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

273 Scopus citations

Abstract

This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.

Original languageEnglish
Title of host publication2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages577-581
Number of pages5
ISBN (Electronic)9781479970889
DOIs
StatePublished - 5 Feb 2014
Event2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014 - Atlanta, United States
Duration: 3 Dec 20145 Dec 2014

Publication series

Name2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014

Conference

Conference2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014
Country/TerritoryUnited States
CityAtlanta
Period3/12/145/12/14

Keywords

  • Deep neural networks
  • Discriminative training
  • Speech enhancement

Fingerprint

Dive into the research topics of 'Discriminatively trained recurrent neural networks for single-channel speech separation'. Together they form a unique fingerprint.

Cite this