TY - GEN
T1 - Enhancing spontaneous speech recognition with BLSTM features
AU - Wöllmer, Martin
AU - Schuller, Björn
PY - 2011
Y1 - 2011
N2 - This paper introduces a novel context-sensitive feature extraction approach for spontaneous speech recognition. As bidirectional Long Short-Term Memory (BLSTM) networks are known to enable improved phoneme recognition accuracies by incorporating long-range contextual information into speech decoding, we integrate the BLSTM principle into a Tandem front-end for probabilistic feature extraction. Unlike previously proposed approaches which exploit BLSTM modeling by generating a discrete phoneme prediction feature, our feature extractor merges continuous high-level probabilistic BLSTM features with low-level features. Evaluations on challenging spontaneous, conversational speech recognition tasks show that this concept prevails over recently published architectures for feature-level context modeling.
AB - This paper introduces a novel context-sensitive feature extraction approach for spontaneous speech recognition. As bidirectional Long Short-Term Memory (BLSTM) networks are known to enable improved phoneme recognition accuracies by incorporating long-range contextual information into speech decoding, we integrate the BLSTM principle into a Tandem front-end for probabilistic feature extraction. Unlike previously proposed approaches which exploit BLSTM modeling by generating a discrete phoneme prediction feature, our feature extractor merges continuous high-level probabilistic BLSTM features with low-level features. Evaluations on challenging spontaneous, conversational speech recognition tasks show that this concept prevails over recently published architectures for feature-level context modeling.
KW - bidirectional neural networks
KW - context modeling
KW - probabilistic features
KW - speech recognition
UR - http://www.scopus.com/inward/record.url?scp=81155123235&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-25020-0_3
DO - 10.1007/978-3-642-25020-0_3
M3 - Conference contribution
AN - SCOPUS:81155123235
SN - 9783642250194
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 17
EP - 24
BT - Advances in Nonlinear Speech Processing - 5th International Conference on Nonlinear Speech Processing, NOLISP 2011, Proceedings
T2 - 5th International Conference on Nonlinear Speech Processing, NOLISP 2011
Y2 - 7 November 2011 through 9 November 2011
ER -