Automatic speech recognition with neural spike trains

Marcus Holmberg, David Gelbart, Ulrich Ramacher, Werner Hemmert

Research output: Contribution to conferencePaperpeer-review

15 Scopus citations

Abstract

A major difference between the human auditory system and automatic speech recognition (ASR) lies in their representation of sound signals: whereas ASR uses a smoothed low-dimensional temporal and spectral representation of sound signals, our hearing system relies on extremely high-dimensional but temporally sparse spike trains. A strength of the latter representation is in the inherent coding of time, which is exploited by neuronal networks along the auditory pathway. We demonstrate ASR results using features purely derived from simulated spike trains of auditory nerve fibers (ANF) and a layer of octopus neurons. Octopus neurons located in the cochlear nucleus are known for their distinct temporal processing: they not only reject steady-state excitation and fire on signal onsets but also enhance the amplitude modulations of voiced speech. With multi-condition training we do not reach the performance of conventional mel-frequency cepstral coefficients (MFCC) features. With clean training however, our spike-based features performed similarly to MFCCs. Further, recognition scores in noise were improved when features derived from ANFs, which mainly represent spectral characteristics of speech signals, were combined with features derived from spike trains of octopus neurons. This result is promising given the relatively small number of neurons we used and the limitations in how the auditory model was interfaced to the ASR back end.

Original languageEnglish
Pages1253-1256
Number of pages4
StatePublished - 2005
Externally publishedYes
Event9th European Conference on Speech Communication and Technology - Lisbon, Portugal
Duration: 4 Sep 20058 Sep 2005

Conference

Conference9th European Conference on Speech Communication and Technology
Country/TerritoryPortugal
CityLisbon
Period4/09/058/09/05

Fingerprint

Dive into the research topics of 'Automatic speech recognition with neural spike trains'. Together they form a unique fingerprint.

Cite this