TY - JOUR
T1 - Evaluating the transferability of adversarial robustness to target domains
AU - Kopetzki, Anna Kathrin
AU - Bojchevski, Aleksandar
AU - Günnemann, Stephan
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025
Y1 - 2025
N2 - Knowledge transfer is an effective method for learning, particularly useful when labeled data are limited or when training a model from scratch is too expensive. Most of the research on transfer learning focuses on achieving accurate models, overlooking the crucial aspect of adversarial robustness. However, ensuring robustness is vital, especially when applying transfer learning in safety-critical domains. We compare robustness of models obtained by 11 training procedures on source domains and 3 retraining schemes on target domains, including normal, adversarial, contrastive, and Lipschitz constrained training variants. Robustness is analyzed by adversarial attacks with respect to two different transfer learning model outputs: (i) the latent representations and (ii) the predictions. Studying latent representations in correlation with predictions is crucial for robustness of transfer learning models, since they are solely learned on the source domain. Besides adversarial attacks that aim at changing the prediction, we also analyze the effect of directly attacking representations. Our results show that adversarial robustness can transfer across domains, but effective robust transfer learning requires techniques that ensure robustness independent of the training data to preserve them during the transfer. Retraining on the target domain has a minor impact on the robustness of the target model. Representations exhibit greater robustness compared to predictions across both the source and target domain.
AB - Knowledge transfer is an effective method for learning, particularly useful when labeled data are limited or when training a model from scratch is too expensive. Most of the research on transfer learning focuses on achieving accurate models, overlooking the crucial aspect of adversarial robustness. However, ensuring robustness is vital, especially when applying transfer learning in safety-critical domains. We compare robustness of models obtained by 11 training procedures on source domains and 3 retraining schemes on target domains, including normal, adversarial, contrastive, and Lipschitz constrained training variants. Robustness is analyzed by adversarial attacks with respect to two different transfer learning model outputs: (i) the latent representations and (ii) the predictions. Studying latent representations in correlation with predictions is crucial for robustness of transfer learning models, since they are solely learned on the source domain. Besides adversarial attacks that aim at changing the prediction, we also analyze the effect of directly attacking representations. Our results show that adversarial robustness can transfer across domains, but effective robust transfer learning requires techniques that ensure robustness independent of the training data to preserve them during the transfer. Retraining on the target domain has a minor impact on the robustness of the target model. Representations exhibit greater robustness compared to predictions across both the source and target domain.
KW - Adversarial robustness
KW - Adversarial training
KW - Adversarially robust transfer
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85217403594&partnerID=8YFLogxK
U2 - 10.1007/s10115-024-02333-x
DO - 10.1007/s10115-024-02333-x
M3 - Article
AN - SCOPUS:85217403594
SN - 0219-1377
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
ER -