Critical assessment of missense variant effect predictors on disease-relevant variant data

  • Ruchir Rastogi
  • , Ryan Chung
  • , Sindy Li
  • , Chang Li
  • , Kyoungyeul Lee
  • , Junwoo Woo
  • , Dong Wook Kim
  • , Changwon Keum
  • , Giulia Babbi
  • , Pier Luigi Martelli
  • , Castrense Savojardo
  • , Rita Casadio
  • , Kirsley Chennen
  • , Thomas Weber
  • , Olivier Poch
  • , François Ancien
  • , Gabriel Cia
  • , Fabrizio Pucci
  • , Daniele Raimondi
  • , Wim Vranken
  • Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N. Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E. Brenner, Nilah M. Ioannidis

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.

Original languageEnglish
Article number101213
Pages (from-to)281-293
Number of pages13
JournalHuman Genetics
Volume144
Issue number2
DOIs
StatePublished - Mar 2025

Fingerprint

Dive into the research topics of 'Critical assessment of missense variant effect predictors on disease-relevant variant data'. Together they form a unique fingerprint.

Cite this