TY - GEN
T1 - Incentivising the federation
T2 - 2024 European Interdisciplinary Cybersecurity Conference, EICC 2024
AU - Usynin, Dmitrii
AU - Rueckert, Daniel
AU - Kaissis, Georgios
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/6/5
Y1 - 2024/6/5
N2 - Obtaining high-quality data for collaborative training of machine learning models can be a challenging task due to A) regulatory concerns and B) a lack of data owner incentives to participate. The first issue can be addressed through the combination of distributed machine learning techniques (e.g. federated learning) and privacy enhancing technologies (PET), such as the differentially private (DP) model training. The second challenge can be addressed by rewarding the participants for giving access to data which is beneficial to the training model, which is of particular importance in federated settings, where the data is unevenly distributed. However, DP noise can adversely affect the underrepresented and the atypical (yet often informative) data samples, making it difficult to assess their usefulness. In this work, we investigate how to leverage gradient information to permit the participants of private training settings to select the data most beneficial for the jointly trained model. We assess two such methods, namely variance of gradients (VoG) and the privacy loss-input susceptibility score (PLIS). We show that these techniques can provide the federated clients with tools for principled data selection even in stricter privacy settings.
AB - Obtaining high-quality data for collaborative training of machine learning models can be a challenging task due to A) regulatory concerns and B) a lack of data owner incentives to participate. The first issue can be addressed through the combination of distributed machine learning techniques (e.g. federated learning) and privacy enhancing technologies (PET), such as the differentially private (DP) model training. The second challenge can be addressed by rewarding the participants for giving access to data which is beneficial to the training model, which is of particular importance in federated settings, where the data is unevenly distributed. However, DP noise can adversely affect the underrepresented and the atypical (yet often informative) data samples, making it difficult to assess their usefulness. In this work, we investigate how to leverage gradient information to permit the participants of private training settings to select the data most beneficial for the jointly trained model. We assess two such methods, namely variance of gradients (VoG) and the privacy loss-input susceptibility score (PLIS). We show that these techniques can provide the federated clients with tools for principled data selection even in stricter privacy settings.
KW - data valuation
KW - differential privacy
KW - federated learning
UR - http://www.scopus.com/inward/record.url?scp=85196160222&partnerID=8YFLogxK
U2 - 10.1145/3655693.3660253
DO - 10.1145/3655693.3660253
M3 - Conference contribution
AN - SCOPUS:85196160222
T3 - ACM International Conference Proceeding Series
SP - 179
EP - 185
BT - Proceedings of the 2024 European Interdisciplinary Cybersecurity Conference, EICC 2024
A2 - Coopamootoo, Kovila
A2 - Sirivianos, Michael
PB - Association for Computing Machinery
Y2 - 5 June 2024 through 6 June 2024
ER -