TY - JOUR
T1 - Efficient dataset generation for machine learning halide perovskite alloys
AU - Homm, Henrietta
AU - Laakso, Jarno
AU - Rinke, Patrick
N1 - Publisher Copyright:
© 2025 American Physical Society.
PY - 2025/5
Y1 - 2025/5
N2 - Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb(Cl/Br)3 perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb(Cl/Br)3 data and extended to the ternary alloy CsSn(Cl/Br/I)3. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb(Cl/Br)3 demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn(Cl/Br/I)3 is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.
AB - Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb(Cl/Br)3 perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb(Cl/Br)3 data and extended to the ternary alloy CsSn(Cl/Br/I)3. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb(Cl/Br)3 demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn(Cl/Br/I)3 is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.
UR - http://www.scopus.com/inward/record.url?scp=105005157961&partnerID=8YFLogxK
U2 - 10.1103/PhysRevMaterials.9.053802
DO - 10.1103/PhysRevMaterials.9.053802
M3 - Article
AN - SCOPUS:105005157961
SN - 2475-9953
VL - 9
JO - Physical Review Materials
JF - Physical Review Materials
IS - 5
M1 - 053802
ER -