TY - GEN
T1 - Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition
AU - Kürzinger, Ludwig
AU - Chavez Rosas, Edgar Ricardo
AU - Li, Lujun
AU - Watzel, Tobias
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.
AB - Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.
KW - Adversarial examples
KW - Adversarial training
KW - ESPnet
KW - Hybrid CTC/Attention
UR - http://www.scopus.com/inward/record.url?scp=85092903047&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-60276-5_26
DO - 10.1007/978-3-030-60276-5_26
M3 - Conference contribution
AN - SCOPUS:85092903047
SN - 9783030602758
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 255
EP - 266
BT - Speech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
A2 - Karpov, Alexey
A2 - Potapova, Rodmonga
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd International Conference on Speech and Computer, SPECOM 2020
Y2 - 7 October 2020 through 9 October 2020
ER -