Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise

Martin Wollmer, Zixing Zhang, Felix Weninger, Bjorn Schuller, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

41 Zitate (Scopus)

Abstract

The recognition of spontaneous speech in highly variable noise is known to be a challenge, especially at low signal-to-noise ratios (SNR). In this paper, we investigate the effect of applying bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks for speech feature enhancement in noisy conditions. BLSTM networks tend to prevail over conventional neural network architectures, whenever the recognition or regression task relies on an intelligent exploitation of temporal context information. We show that BLSTM networks are well-suited for mapping from noisy to clean speech features and that the obtained recognition performance gain is partly complementary to improvements via additional techniques such as speech enhancement by non-negative matrix factorization and probabilistic feature generation by Bottleneck-BLSTM networks. Compared to simple multi-condition training or feature enhancement via standard recurrent neural networks, our BLSTM-based feature enhancement approach leads to remarkable gains in word accuracy in a highly challenging task of recognizing spontaneous speech at SNR levels between -6 and 9 dB.

OriginalspracheEnglisch
Titel2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Seiten6822-6826
Seitenumfang5
DOIs
PublikationsstatusVeröffentlicht - 18 Okt. 2013
Veranstaltung2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Kanada
Dauer: 26 Mai 201331 Mai 2013

Publikationsreihe

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Konferenz

Konferenz2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Land/GebietKanada
OrtVancouver, BC
Zeitraum26/05/1331/05/13

Fingerprint

Untersuchen Sie die Forschungsthemen von „Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren