Spoken term detection with connectionist temporal classification: A novel hybrid CTC-DBN decoder

Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper proposes a novel system for robust keyword detection in continuous speech. Our decoder is composed of a bidirectional Long Short-Term Memory recurrent neural network using a Connectionist Temporal Classification (CTC) output layer, and a Dynamic Bayesian Network (DBN). The CTC network exploits bidirectional context information to reliably identify phonemes, whereas the DBN is able to discriminate between keywords and arbitrary speech while explicitly modeling substitutions, deletions, and insertions in the CTC phoneme output string. Our technique is vocabulary independent and does not require an explicit garbage model. Experiments show that our system architecture prevails over a standard Hidden Markov Model approach.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5274-5277
Number of pages4
ISBN (Print)9781424442966
DOIs
StatePublished - 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 14 Mar 201019 Mar 2010

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/TerritoryUnited States
CityDallas, TX
Period14/03/1019/03/10

Keywords

  • Connectionist temporal classification
  • Dynamic Bayesian networks
  • Keyword spotting
  • Spoken term detection

Fingerprint

Dive into the research topics of 'Spoken term detection with connectionist temporal classification: A novel hybrid CTC-DBN decoder'. Together they form a unique fingerprint.

Cite this