TY - GEN
T1 - Probabilistic asr feature extraction applying context-sensitive connectionist temporal classification networks
AU - Wollmer, Martin
AU - Schuller, Bjorn
AU - Rigoll, Gerhard
PY - 2013/10/18
Y1 - 2013/10/18
N2 - This paper proposes a novel automatic speech recognition (ASR) front-end that unites the principles of bidirectional Long Short-Term Memory (BLSTM), Connectionist Temporal Classification (CTC), and Bottleneck (BN) feature generation. BLSTM networks are known to produce better probabilistic ASR features than conventional multilayer perceptrons since they are able to exploit a self-learned amount of temporal context for phoneme estimation. Combining BLSTM networks with a CTC output layer implies the advantage that the network can be trained on unsegmented data so that the quality of phoneme prediction does not rely on potentially error-prone forced alignment segmentations of the training set. In challenging ASR scenarios involving highly spontaneous, disfluent, and noisy speech, our BN-CTC front-end leads to remarkable word accuracy improvements and prevails over a series of previously introduced BLSTM-based ASR systems.
AB - This paper proposes a novel automatic speech recognition (ASR) front-end that unites the principles of bidirectional Long Short-Term Memory (BLSTM), Connectionist Temporal Classification (CTC), and Bottleneck (BN) feature generation. BLSTM networks are known to produce better probabilistic ASR features than conventional multilayer perceptrons since they are able to exploit a self-learned amount of temporal context for phoneme estimation. Combining BLSTM networks with a CTC output layer implies the advantage that the network can be trained on unsegmented data so that the quality of phoneme prediction does not rely on potentially error-prone forced alignment segmentations of the training set. In challenging ASR scenarios involving highly spontaneous, disfluent, and noisy speech, our BN-CTC front-end leads to remarkable word accuracy improvements and prevails over a series of previously introduced BLSTM-based ASR systems.
KW - Automatic Speech Recognition
KW - Connectionist Temporal Classification
KW - Long Short-Term Memory
KW - Tandem Features
UR - http://www.scopus.com/inward/record.url?scp=84890464853&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2013.6639045
DO - 10.1109/ICASSP.2013.6639045
M3 - Conference contribution
AN - SCOPUS:84890464853
SN - 9781479903566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7125
EP - 7129
BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Y2 - 26 May 2013 through 31 May 2013
ER -