Automatic speech recognition with an adaptation model motivated by auditory processing

Marcus Holmberg, David Gelbart, Werner Hemmert

Research output: Contribution to journalArticlepeer-review

50 Scopus citations

Abstract

The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.

Original languageEnglish
Pages (from-to)43-49
Number of pages7
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume14
Issue number1
DOIs
StatePublished - Jan 2006
Externally publishedYes

Keywords

  • Neural adaptation
  • Noise robustness
  • Speech recognition

Fingerprint

Dive into the research topics of 'Automatic speech recognition with an adaptation model motivated by auditory processing'. Together they form a unique fingerprint.

Cite this