A multi-stream ASR framework for BLSTM modeling of conversational speech

Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

34 Zitate (Scopus)

Abstract

We propose a novel multi-stream framework for continuous conversational speech recognition which employs bidirectional Long Short-Term Memory (BLSTM) networks for phoneme prediction. The BLSTM architecture allows recurrent neural nets to model long-range context, which led to improved ASR performance when combined with conventional triphone modeling in a Tandem system. In this paper, we extend the principle of joint BLSTM and triphone modeling to a multi-stream system which uses MFCC features and BLSTM predictions as observations originating from two independent data streams. Using the COSINE database, we show that this technique prevails over a recently proposed single-stream Tandem system as well as over a conventional HMM recognizer.

OriginalspracheEnglisch
Titel2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Seiten4860-4863
Seitenumfang4
DOIs
PublikationsstatusVeröffentlicht - 2011
Veranstaltung36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Tschechische Republik
Dauer: 22 Mai 201127 Mai 2011

Publikationsreihe

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Konferenz

Konferenz36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Land/GebietTschechische Republik
OrtPrague
Zeitraum22/05/1127/05/11

Fingerprint

Untersuchen Sie die Forschungsthemen von „A multi-stream ASR framework for BLSTM modeling of conversational speech“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren