Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework

Martin Wöllmer, Florian Eyben, Alex Graves, Björn Schuller, Gerhard Rigoll

Research output: Contribution to journalArticlepeer-review

73 Scopus citations

Abstract

Robustly detecting keywords in human speech is an important precondition for cognitive systems, which aim at intelligently interacting with users. Conventional techniques for keyword spotting usually show good performance when evaluated on well articulated read speech. However, modeling natural, spontaneous, and emotionally colored speech is challenging for today's speech recognition systems and thus requires novel approaches with enhanced robustness. In this article, we propose a new architecture for vocabulary independent keyword detection as needed for cognitive virtual agents such as the SEMAINE system. Our word spotting model is composed of a Dynamic Bayesian Network (DBN) and a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The BLSTM network uses a self-learned amount of contextual information to provide a discrete phoneme prediction feature for the DBN, which is able to distinguish between keywords and arbitrary speech. We evaluate our Tandem BLSTM-DBN technique on both read speech and spontaneous emotional speech and show that our method significantly outperforms conventional Hidden Markov Model-based approaches for both application scenarios.

Original languageEnglish
Pages (from-to)180-190
Number of pages11
JournalCognitive Computation
Volume2
Issue number3
DOIs
StatePublished - 2010

Keywords

  • Cognitive systems
  • Dynamic bayesian networks
  • Keyword spotting
  • Long short-term memory
  • Virtual agents

Fingerprint

Dive into the research topics of 'Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework'. Together they form a unique fingerprint.

Cite this