A multi-stream ASR framework for BLSTM modeling of conversational speech

Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Scopus citations

Abstract

We propose a novel multi-stream framework for continuous conversational speech recognition which employs bidirectional Long Short-Term Memory (BLSTM) networks for phoneme prediction. The BLSTM architecture allows recurrent neural nets to model long-range context, which led to improved ASR performance when combined with conventional triphone modeling in a Tandem system. In this paper, we extend the principle of joint BLSTM and triphone modeling to a multi-stream system which uses MFCC features and BLSTM predictions as observations originating from two independent data streams. Using the COSINE database, we show that this technique prevails over a recently proposed single-stream Tandem system as well as over a conventional HMM recognizer.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages4860-4863
Number of pages4
DOIs
StatePublished - 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 22 May 201127 May 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Country/TerritoryCzech Republic
CityPrague
Period22/05/1127/05/11

Keywords

  • Context Modeling
  • Conversational Speech Recognition
  • Long Short-Term Memory
  • Recurrent Neural Networks

Fingerprint

Dive into the research topics of 'A multi-stream ASR framework for BLSTM modeling of conversational speech'. Together they form a unique fingerprint.

Cite this