Single-channel speech separation with memory-enhanced recurrent neural networks

Felix Weninger, Florian Eyben, Bjorn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

107 Scopus citations

Abstract

In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally colored speech, degraded by several hours of real noise recordings comprising stationary and non-stationary sources and convolutive noise from the Aachen Room Impulse Response database. In the result, the proposed method is shown to provide superior noise reduction at low signal-to-noise ratios while creating very little artifacts at higher signal-to-noise ratios, thereby outperforming unsupervised magnitude domain spectral subtraction by a large margin in terms of source-distortion ratio.

Original languageEnglish
Title of host publication2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3709-3713
Number of pages5
ISBN (Print)9781479928927
DOIs
StatePublished - 2014
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy
Duration: 4 May 20149 May 2014

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Country/TerritoryItaly
CityFlorence
Period4/05/149/05/14

Keywords

  • Long Short-Term Memory
  • Speech enhancement
  • recurrent neural networks
  • speech separation

Fingerprint

Dive into the research topics of 'Single-channel speech separation with memory-enhanced recurrent neural networks'. Together they form a unique fingerprint.

Cite this