Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations

Lukas Huber, Marc Alexander Kühn, Edoardo Mosca, Georg Groh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

State-of-the-art machine learning models are prone to adversarial attacks: Maliciously crafted inputs to fool the model into making a wrong prediction, often with high confidence. While defense strategies have been extensively explored in the computer vision domain, research in natural language processing still lacks techniques to make models resilient to adversarial text inputs. We adapt a technique from computer vision to detect word-level attacks targeting text classifiers. This method relies on training an adversarial detector leveraging Shapley additive explanations and outperforms the current state-of-the-art on two benchmarks. Furthermore, we prove the detector requires only a low amount of training samples and, in some cases, generalizes to different datasets without needing to retrain.

Original languageEnglish
Title of host publicationACL 2022 - 7th Workshop on Representation Learning for NLP, RepL4NLP 2022 - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages156-166
Number of pages11
ISBN (Electronic)9781955917483
StatePublished - 2022
Event7th Workshop on Representation Learning for NLP, RepL4NLP 2022 at ACL 2022 - Dublin, Ireland
Duration: 26 May 2022 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference7th Workshop on Representation Learning for NLP, RepL4NLP 2022 at ACL 2022
Country/TerritoryIreland
CityDublin
Period26/05/22 → …

Fingerprint

Dive into the research topics of 'Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations'. Together they form a unique fingerprint.

Cite this