Efficient dataset generation for machine learning halide perovskite alloys

Henrietta Homm, Jarno Laakso, Patrick Rinke

Research output: Contribution to journalArticlepeer-review

Abstract

Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb(Cl/Br)3 perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb(Cl/Br)3 data and extended to the ternary alloy CsSn(Cl/Br/I)3. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb(Cl/Br)3 demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn(Cl/Br/I)3 is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.

Original languageEnglish
Article number053802
JournalPhysical Review Materials
Volume9
Issue number5
DOIs
StatePublished - May 2025

Fingerprint

Dive into the research topics of 'Efficient dataset generation for machine learning halide perovskite alloys'. Together they form a unique fingerprint.

Cite this