Auditory-based Automatic Speech Recognition

Werner Hemmert, Marcus Holmberg, David Gelbart

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

In this paper we develop a physiologically motivated model of peripheral auditory processing and evaluate how the different processing steps influence automatic speech recognition in noise. The model features large dynamic compression (>60 dB) and a realistic sensory cell model. The compression range was well matched to the limited dynamic range of the sensory cells and the model yielded surprisingly high recognition scores. We also developed a computationally efficient simplified model of auditory processing and found that a model of adaptation could improve recognition accuracy. Adaptation is a basic principle of neuronal processing, which accentuates signal onsets. Applying this adaptation model to mel-frequency cepstral coefficient (MFCC) feature extraction enhanced recognition accuracy in noise (AURORA 2 task, averaged recognition scores) from 56.4% to 75.6% (clean training condition), a relative improvement of 41% in word error rate. Adaptation outperformed RASTA processing by more than 10%, which corresponds to a relative improvement of 31%.

Original languageEnglish
StatePublished - 2004
Externally publishedYes
Event2004 ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, SAPA 2004 - Jeju, Korea, Republic of
Duration: 3 Oct 2004 → …

Conference

Conference2004 ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, SAPA 2004
Country/TerritoryKorea, Republic of
CityJeju
Period3/10/04 → …

Fingerprint

Dive into the research topics of 'Auditory-based Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this