TY - GEN
T1 - Is feature selection secure against training data poisoning?
AU - Xiao, Huang
AU - Biggio, Battista
AU - Brown, Gavin
AU - Fumera, Giorgio
AU - Eckert, Claudia
AU - Roli, Fabio
N1 - Publisher Copyright:
Copyright © 2015 by the author(s).
PY - 2015
Y1 - 2015
N2 - Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.
AB - Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.
UR - http://www.scopus.com/inward/record.url?scp=84969900476&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84969900476
T3 - 32nd International Conference on Machine Learning, ICML 2015
SP - 1689
EP - 1698
BT - 32nd International Conference on Machine Learning, ICML 2015
A2 - Blei, David
A2 - Bach, Francis
PB - International Machine Learning Society (IMLS)
T2 - 32nd International Conference on Machine Learning, ICML 2015
Y2 - 6 July 2015 through 11 July 2015
ER -