A Global Discriminant Joint Training Framework for Robust Speech Recognition

Lujun Li, Ludwig Kurzinger, Tobias Watzel, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Robustness in adverse acoustic conditions is critical for practical human-machine interaction. A common solution for this problem is adding an independent speech enhancement front-end. Nonetheless, due to being trained separately from the automatic speech recognition (ASR) module, the independent enhancement front-end falls into the sub-optimum easily. Besides, the handcrafted loss function of the enhancement module tends to introduce unseen distortions, which even degrade the ASR performance. To address this concern, a promising idea of the joint training is progressively drawing more interests. Nevertheless, none of the previously proposed joint-training frameworks is built on the increasingly popular self-attention mechanism or generative adversarial architecture. This paper proposes a novel joint-training framework, concatenating a speech enhancement generative adversarial network as the front-end and a self-attention based ASR module as the back-end to be jointly trained as an extensive network, to boost the noise robustness of the end-to-end ASR system. A Sinc convolution layer is usefully merged into the speech enhancement front-end for more representative features extraction. Moreover, a discriminant component plays the role of the local guide of the enhancement module and the global guide in the joint training simultaneously, which guides the enhancement front-end to output more desirable features for the subsequent ASR module and thereby offsets the limitation of the separate training and handcrafted loss functions.Systematic experiments reveal that the proposed framework significantly overtakes other competitive solutions, especially in challenging environments.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021
PublisherIEEE Computer Society
Pages544-551
Number of pages8
ISBN (Electronic)9781665408981
DOIs
StatePublished - 2021
Event33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021 - Virtual, Online, United States
Duration: 1 Nov 20213 Nov 2021

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume2021-November
ISSN (Print)1082-3409

Conference

Conference33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021
Country/TerritoryUnited States
CityVirtual, Online
Period1/11/213/11/21

Keywords

  • Sinc convolution
  • generative adversarial networks
  • joint training framework
  • robust speech recognition
  • self-attention mechanism
  • speech enhancement

Fingerprint

Dive into the research topics of 'A Global Discriminant Joint Training Framework for Robust Speech Recognition'. Together they form a unique fingerprint.

Cite this