TY - JOUR
T1 - Vulnerabilities of Data Protection in Vertical Federated Learning Training and Countermeasures
AU - Zhu, Derui
AU - Chen, Jinfu
AU - Zhou, Xuebing
AU - Shang, Weiyi
AU - Hassan, Ahmed E.
AU - Grossklags, Jens
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Vertical federated learning (VFL) is an increasingly popular, yet understudied, collaborative learning technique. In VFL, features and labels are distributed among different participants allowing for various innovative applications in business domains, e.g., online marketing. When deploying VFL, training data (labels and features) from each participant ought to be protected; however, very few studies have investigated the vulnerability of data protection in the VFL training stage. In this paper, we propose a posterior-difference-based data attack, VFLRecon, reconstructing labels and features to examine this problem. Our experiments show that standard VFL is highly vulnerable to serious privacy threats, with reconstruction achieving up to 92% label accuracy and 0.05 feature MSE, compared to our baseline with 55% label accuracy and 0.19 feature MSE. Even worse, this privacy risk remains during standard operations (e.g., encrypted aggregation) that appear to be safe. We also systematically analyze data leakage risks in the VFL training stage across diverse data modalities (i.e., tabular data and images), different training frameworks (i.e., with or without encryption techniques), and a wide range of training hyperparameters. To mitigate this risk, we design a novel defense mechanism, VFLDefender, dedicated to obfuscating the correlation between bottom model changes and labels (features) during training. The experimental results demonstrate that VFLDefender prevents reconstruction attacks during standard encryption operations (around 17% more effective than standard encryption operations).
AB - Vertical federated learning (VFL) is an increasingly popular, yet understudied, collaborative learning technique. In VFL, features and labels are distributed among different participants allowing for various innovative applications in business domains, e.g., online marketing. When deploying VFL, training data (labels and features) from each participant ought to be protected; however, very few studies have investigated the vulnerability of data protection in the VFL training stage. In this paper, we propose a posterior-difference-based data attack, VFLRecon, reconstructing labels and features to examine this problem. Our experiments show that standard VFL is highly vulnerable to serious privacy threats, with reconstruction achieving up to 92% label accuracy and 0.05 feature MSE, compared to our baseline with 55% label accuracy and 0.19 feature MSE. Even worse, this privacy risk remains during standard operations (e.g., encrypted aggregation) that appear to be safe. We also systematically analyze data leakage risks in the VFL training stage across diverse data modalities (i.e., tabular data and images), different training frameworks (i.e., with or without encryption techniques), and a wide range of training hyperparameters. To mitigate this risk, we design a novel defense mechanism, VFLDefender, dedicated to obfuscating the correlation between bottom model changes and labels (features) during training. The experimental results demonstrate that VFLDefender prevents reconstruction attacks during standard encryption operations (around 17% more effective than standard encryption operations).
KW - Privacy-preserving machine learning
KW - data safety
KW - privacy
KW - privacy leakage
KW - vertical federated learning
UR - http://www.scopus.com/inward/record.url?scp=85187255740&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2024.3361813
DO - 10.1109/TIFS.2024.3361813
M3 - Article
AN - SCOPUS:85187255740
SN - 1556-6013
VL - 19
SP - 3674
EP - 3689
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
ER -