TY - GEN
T1 - Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations
AU - Huber, Lukas
AU - Kühn, Marc Alexander
AU - Mosca, Edoardo
AU - Groh, Georg
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - State-of-the-art machine learning models are prone to adversarial attacks: Maliciously crafted inputs to fool the model into making a wrong prediction, often with high confidence. While defense strategies have been extensively explored in the computer vision domain, research in natural language processing still lacks techniques to make models resilient to adversarial text inputs. We adapt a technique from computer vision to detect word-level attacks targeting text classifiers. This method relies on training an adversarial detector leveraging Shapley additive explanations and outperforms the current state-of-the-art on two benchmarks. Furthermore, we prove the detector requires only a low amount of training samples and, in some cases, generalizes to different datasets without needing to retrain.
AB - State-of-the-art machine learning models are prone to adversarial attacks: Maliciously crafted inputs to fool the model into making a wrong prediction, often with high confidence. While defense strategies have been extensively explored in the computer vision domain, research in natural language processing still lacks techniques to make models resilient to adversarial text inputs. We adapt a technique from computer vision to detect word-level attacks targeting text classifiers. This method relies on training an adversarial detector leveraging Shapley additive explanations and outperforms the current state-of-the-art on two benchmarks. Furthermore, we prove the detector requires only a low amount of training samples and, in some cases, generalizes to different datasets without needing to retrain.
UR - http://www.scopus.com/inward/record.url?scp=85149150357&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85149150357
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 156
EP - 166
BT - ACL 2022 - 7th Workshop on Representation Learning for NLP, RepL4NLP 2022 - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 7th Workshop on Representation Learning for NLP, RepL4NLP 2022 at ACL 2022
Y2 - 26 May 2022
ER -