MP3 Compression to Diminish Adversarial Noise in End-to-End Speech Recognition

Iustina Andronic, Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Gerhard Rigoll, Bernhard U. Seeber

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

7 Zitate (Scopus)

Abstract

Audio Adversarial Examples (AAE) represent purposefully designed inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with a new variant of the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. The MP3’s effectiveness against AN is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on different audio formats (both uncompressed and MP3-compressed) and (2) Signal-to-Noise Ratio (SNR) estimated for uncompressed and MP3-compressed AAEs that are reconstructed in the time domain by feature inversion. We found that MP3 compression applied to AAEs indeed reduces the CER when compared to uncompressed AAEs. Moreover, feature-inverted (reconstructed) AAEs had significantly higher SNRs after MP3 compression, indicating that AN was reduced. In contrast to AN, MP3 compression applied to utterances augmented with regular noise resulted in more transcription errors, giving further evidence that MP3 encoding is effective in diminishing AN exclusively.

OriginalspracheEnglisch
TitelSpeech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
Redakteure/-innenAlexey Karpov, Rodmonga Potapova
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten22-34
Seitenumfang13
ISBN (Print)9783030602758
DOIs
PublikationsstatusVeröffentlicht - 2020
Veranstaltung22nd International Conference on Speech and Computer, SPECOM 2020 - St. Petersburg, Russland
Dauer: 7 Okt. 20209 Okt. 2020

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band12335 LNAI
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

Konferenz22nd International Conference on Speech and Computer, SPECOM 2020
Land/GebietRussland
OrtSt. Petersburg
Zeitraum7/10/209/10/20

Fingerprint

Untersuchen Sie die Forschungsthemen von „MP3 Compression to Diminish Adversarial Noise in End-to-End Speech Recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren