MP3 Compression to Diminish Adversarial Noise in End-to-End Speech Recognition

Iustina Andronic, Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Gerhard Rigoll, Bernhard U. Seeber

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Audio Adversarial Examples (AAE) represent purposefully designed inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with a new variant of the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. The MP3’s effectiveness against AN is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on different audio formats (both uncompressed and MP3-compressed) and (2) Signal-to-Noise Ratio (SNR) estimated for uncompressed and MP3-compressed AAEs that are reconstructed in the time domain by feature inversion. We found that MP3 compression applied to AAEs indeed reduces the CER when compared to uncompressed AAEs. Moreover, feature-inverted (reconstructed) AAEs had significantly higher SNRs after MP3 compression, indicating that AN was reduced. In contrast to AN, MP3 compression applied to utterances augmented with regular noise resulted in more transcription errors, giving further evidence that MP3 encoding is effective in diminishing AN exclusively.

Original languageEnglish
Title of host publicationSpeech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
EditorsAlexey Karpov, Rodmonga Potapova
PublisherSpringer Science and Business Media Deutschland GmbH
Pages22-34
Number of pages13
ISBN (Print)9783030602758
DOIs
StatePublished - 2020
Event22nd International Conference on Speech and Computer, SPECOM 2020 - St. Petersburg, Russian Federation
Duration: 7 Oct 20209 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12335 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Speech and Computer, SPECOM 2020
Country/TerritoryRussian Federation
CitySt. Petersburg
Period7/10/209/10/20

Keywords

  • Audio Adversarial Examples
  • Automatic Speech Recognition (ASR)
  • MP3 compression

Fingerprint

Dive into the research topics of 'MP3 Compression to Diminish Adversarial Noise in End-to-End Speech Recognition'. Together they form a unique fingerprint.

Cite this