A Global Discriminant Joint Training Framework for Robust Speech Recognition

Lujun Li, Ludwig Kurzinger, Tobias Watzel, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

Robustness in adverse acoustic conditions is critical for practical human-machine interaction. A common solution for this problem is adding an independent speech enhancement front-end. Nonetheless, due to being trained separately from the automatic speech recognition (ASR) module, the independent enhancement front-end falls into the sub-optimum easily. Besides, the handcrafted loss function of the enhancement module tends to introduce unseen distortions, which even degrade the ASR performance. To address this concern, a promising idea of the joint training is progressively drawing more interests. Nevertheless, none of the previously proposed joint-training frameworks is built on the increasingly popular self-attention mechanism or generative adversarial architecture. This paper proposes a novel joint-training framework, concatenating a speech enhancement generative adversarial network as the front-end and a self-attention based ASR module as the back-end to be jointly trained as an extensive network, to boost the noise robustness of the end-to-end ASR system. A Sinc convolution layer is usefully merged into the speech enhancement front-end for more representative features extraction. Moreover, a discriminant component plays the role of the local guide of the enhancement module and the global guide in the joint training simultaneously, which guides the enhancement front-end to output more desirable features for the subsequent ASR module and thereby offsets the limitation of the separate training and handcrafted loss functions.Systematic experiments reveal that the proposed framework significantly overtakes other competitive solutions, especially in challenging environments.

OriginalspracheEnglisch
TitelProceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021
Herausgeber (Verlag)IEEE Computer Society
Seiten544-551
Seitenumfang8
ISBN (elektronisch)9781665408981
DOIs
PublikationsstatusVeröffentlicht - 2021
Veranstaltung33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021 - Virtual, Online, USA/Vereinigte Staaten
Dauer: 1 Nov. 20213 Nov. 2021

Publikationsreihe

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Band2021-November
ISSN (Print)1082-3409

Konferenz

Konferenz33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021
Land/GebietUSA/Vereinigte Staaten
OrtVirtual, Online
Zeitraum1/11/213/11/21

Fingerprint

Untersuchen Sie die Forschungsthemen von „A Global Discriminant Joint Training Framework for Robust Speech Recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren