TY - JOUR
T1 - Automatic speech recognition with an adaptation model motivated by auditory processing
AU - Holmberg, Marcus
AU - Gelbart, David
AU - Hemmert, Werner
N1 - Funding Information:
Manuscript received January 31, 2005; revised August 25, 2005. This work was supported by the German Federal Ministry for Education and Research within the Munich Bernstein Center for Computational Neuroscience under reference number 01GQ0443. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Bhiksha Raj.
PY - 2006/1
Y1 - 2006/1
N2 - The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.
AB - The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.
KW - Neural adaptation
KW - Noise robustness
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=33744994972&partnerID=8YFLogxK
U2 - 10.1109/TSA.2005.860349
DO - 10.1109/TSA.2005.860349
M3 - Article
AN - SCOPUS:33744994972
SN - 1558-7916
VL - 14
SP - 43
EP - 49
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 1
ER -