TY - GEN
T1 - Improving keyword spotting with a tandem BLSTM-DBN architecture
AU - Wöllmer, Martin
AU - Eyben, Florian
AU - Graves, Alex
AU - Schuller, Björn
AU - Rigoll, Gerhard
PY - 2010
Y1 - 2010
N2 - We propose a novel architecture for keyword spotting which is composed of a Dynamic Bayesian Network (DBN) and a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The DBN uses a hidden garbage variable as well as the concept of switching parents to discriminate between keywords and arbitrary speech. Contextual information is incorporated by a BLSTM network, providing a discrete phoneme prediction feature for the DBN. Together with continuous acoustic features, the discrete BLSTM output is processed by the DBN which detects keywords. Due to the flexible design of our Tandem BLSTM-DBN recognizer, new keywords can be added to the vocabulary without having to re-train the model. Further, our concept does not require the training of an explicit garbage model. Experiments on the TIMIT corpus show that incorporating a BLSTM network into the DBN architecture can increase true positive rates by up to 10%.
AB - We propose a novel architecture for keyword spotting which is composed of a Dynamic Bayesian Network (DBN) and a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The DBN uses a hidden garbage variable as well as the concept of switching parents to discriminate between keywords and arbitrary speech. Contextual information is incorporated by a BLSTM network, providing a discrete phoneme prediction feature for the DBN. Together with continuous acoustic features, the discrete BLSTM output is processed by the DBN which detects keywords. Due to the flexible design of our Tandem BLSTM-DBN recognizer, new keywords can be added to the vocabulary without having to re-train the model. Further, our concept does not require the training of an explicit garbage model. Experiments on the TIMIT corpus show that incorporating a BLSTM network into the DBN architecture can increase true positive rates by up to 10%.
KW - Dynamic Bayesian Networks
KW - Keyword spotting
KW - Long Short-Term Memory
UR - http://www.scopus.com/inward/record.url?scp=77951442059&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-11509-7_9
DO - 10.1007/978-3-642-11509-7_9
M3 - Conference contribution
AN - SCOPUS:77951442059
SN - 364211508X
SN - 9783642115080
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 68
EP - 75
BT - Advances in Nonlinear Speech Processing - International Conference on Nonlinear Speech Processing, NOLISP 2009, Revised Selected Papers
T2 - International Conference on Nonlinear Speech Processing, NOLISP 2009
Y2 - 25 June 2009 through 27 June 2009
ER -