TY - JOUR
T1 - Keyword spotting exploiting Long Short-Term Memory
AU - Wöllmer, Martin
AU - Schuller, Björn
AU - Rigoll, Gerhard
PY - 2013/2
Y1 - 2013/2
N2 - We investigate various techniques for keyword spotting which are exclusively based on acoustic modeling and do not presume the existence of an in-domain language model. Since adequate context modeling is nevertheless necessary for word spotting, we show how the principle of Long Short-Term Memory (LSTM) can be incorporated into the decoding process. We propose a novel technique that exploits LSTM in combination with Connectionist Temporal Classification in order to improve performance by using a self-learned amount of contextual information. All considered approaches are evaluated on read speech as contained in the TIMIT corpus as well as on the SEMAINE database which consists of spontaneous and emotionally colored speech. As further evidence for the effectiveness of LSTM modeling for keyword spotting, results on the CHiME task are shown.
AB - We investigate various techniques for keyword spotting which are exclusively based on acoustic modeling and do not presume the existence of an in-domain language model. Since adequate context modeling is nevertheless necessary for word spotting, we show how the principle of Long Short-Term Memory (LSTM) can be incorporated into the decoding process. We propose a novel technique that exploits LSTM in combination with Connectionist Temporal Classification in order to improve performance by using a self-learned amount of contextual information. All considered approaches are evaluated on read speech as contained in the TIMIT corpus as well as on the SEMAINE database which consists of spontaneous and emotionally colored speech. As further evidence for the effectiveness of LSTM modeling for keyword spotting, results on the CHiME task are shown.
KW - Dynamic Bayesian Networks
KW - Keyword spotting
KW - Long Short-Term Memory
KW - Recurrent neural networks
UR - http://www.scopus.com/inward/record.url?scp=84870240802&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2012.08.006
DO - 10.1016/j.specom.2012.08.006
M3 - Article
AN - SCOPUS:84870240802
SN - 0167-6393
VL - 55
SP - 252
EP - 265
JO - Speech Communication
JF - Speech Communication
IS - 2
ER -