Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias Watzel, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.

Original languageEnglish
Title of host publicationSpeech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
EditorsAlexey Karpov, Rodmonga Potapova
PublisherSpringer Science and Business Media Deutschland GmbH
Pages255-266
Number of pages12
ISBN (Print)9783030602758
DOIs
StatePublished - 2020
Event22nd International Conference on Speech and Computer, SPECOM 2020 - St. Petersburg, Russian Federation
Duration: 7 Oct 20209 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12335 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Speech and Computer, SPECOM 2020
Country/TerritoryRussian Federation
CitySt. Petersburg
Period7/10/209/10/20

Keywords

  • Adversarial examples
  • Adversarial training
  • ESPnet
  • Hybrid CTC/Attention

Fingerprint

Dive into the research topics of 'Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition'. Together they form a unique fingerprint.

Cite this