TY - GEN
T1 - Spoken term detection with connectionist temporal classification
T2 - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
AU - Wöllmer, Martin
AU - Eyben, Florian
AU - Schuller, Björn
AU - Rigoll, Gerhard
PY - 2010
Y1 - 2010
N2 - This paper proposes a novel system for robust keyword detection in continuous speech. Our decoder is composed of a bidirectional Long Short-Term Memory recurrent neural network using a Connectionist Temporal Classification (CTC) output layer, and a Dynamic Bayesian Network (DBN). The CTC network exploits bidirectional context information to reliably identify phonemes, whereas the DBN is able to discriminate between keywords and arbitrary speech while explicitly modeling substitutions, deletions, and insertions in the CTC phoneme output string. Our technique is vocabulary independent and does not require an explicit garbage model. Experiments show that our system architecture prevails over a standard Hidden Markov Model approach.
AB - This paper proposes a novel system for robust keyword detection in continuous speech. Our decoder is composed of a bidirectional Long Short-Term Memory recurrent neural network using a Connectionist Temporal Classification (CTC) output layer, and a Dynamic Bayesian Network (DBN). The CTC network exploits bidirectional context information to reliably identify phonemes, whereas the DBN is able to discriminate between keywords and arbitrary speech while explicitly modeling substitutions, deletions, and insertions in the CTC phoneme output string. Our technique is vocabulary independent and does not require an explicit garbage model. Experiments show that our system architecture prevails over a standard Hidden Markov Model approach.
KW - Connectionist temporal classification
KW - Dynamic Bayesian networks
KW - Keyword spotting
KW - Spoken term detection
UR - http://www.scopus.com/inward/record.url?scp=78049359820&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2010.5494980
DO - 10.1109/ICASSP.2010.5494980
M3 - Conference contribution
AN - SCOPUS:78049359820
SN - 9781424442966
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5274
EP - 5277
BT - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 March 2010 through 19 March 2010
ER -