Skip to main navigation Skip to search Skip to main content

Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

  • Thorben Prein
  • , Elton Pan
  • , Janik Jehkul
  • , Steffen Weinmann
  • , Elsa Olivetti
  • , Jennifer L.M. Rupp
  • Technical University of Munich
  • Munich Data Science Institute (MDSI)
  • TUMint.Energy Research GmbH
  • Massachusetts Institute of Technology

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Inorganic synthesis planning currently relies primarily on heuristic approaches or machine learning models trained on limited data sets, which constrains its generality. We demonstrate that language models (LMs) without task-specific fine-tuning can recall synthesis conditions reported in the scientific literature. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash, and Llama 4 Maverick achieve a Top-1 precursor prediction accuracy of up to 53.8% and a Top-5 performance of 66.8% on a held-out set of 1000 reactions. They also predict calcination and sintering temperatures with mean absolute errors of <126 °C, matching or surpassing specialized regression models. Ensembling these LMs further enhances predictive accuracy and reduces inference cost per prediction by up to 70%. Given the broad, cross-domain knowledge of LMs, we evaluate whether they enable knowledge transfer by training a transformer, SyntMTE, on 28,548 LM-generated reaction recipes. Compared to a model trained on literature-reported data, we find that a model trained solely on LM-generated data exhibits competitive performance (only 6% worse). Conversely, a model trained on both the LM-generated and literature-reported data improves performance by up to 4%. In a case study on Li7La3Zr2O12 solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable and data-efficient inorganic synthesis planning.

Original languageEnglish
Pages (from-to)69221-69233
Number of pages13
JournalACS Applied Materials and Interfaces
Volume17
Issue number51
DOIs
StatePublished - 24 Dec 2025

Keywords

  • large language models
  • precursor recommendation
  • solid-state synthesis
  • synthesis condition prediction
  • synthetic data augmentation

Fingerprint

Dive into the research topics of 'Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials'. Together they form a unique fingerprint.

Cite this