Accommodating population differences when validating risk prediction models

Ruth M. Pfeiffer, Yiyao Chen, Mitchell H. Gail, Donna P. Ankerst

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Validation of risk prediction models in independent data provides a more rigorous assessment of model performance than internal assessment, for example, done by cross-validation in the data used for model development. However, several differences between the populations that gave rise to the training and the validation data can lead to seemingly poor performance of a risk model. In this paper we formalize the notions of “similarity” or “relatedness” of the training and validation data, and define reproducibility and transportability. We address the impact of different distributions of model predictors and differences in verifying the disease status or outcome on measures of calibration, accuracy and discrimination of a model. When individual level information from both the training and validation data sets is available, we propose and study weighted versions of the validation metrics that adjust for differences in the risk factor distributions and in outcome verification between the training and validation data to provide a more comprehensive assessment of model performance. We provide conditions on the risk model and the populations that gave rise to the training and validation data that ensure a model's reproducibility or transportability, and show how to check these conditions using weighted and unweighted performance measures. We illustrate the method by developing and validating a model that predicts the risk of developing prostate cancer using data from two large prostate cancer screening trials.

Original languageEnglish
Pages (from-to)4756-4780
Number of pages25
JournalStatistics in Medicine
Volume41
Issue number24
DOIs
StatePublished - 30 Oct 2022

Keywords

  • population differences
  • risk factor heterogeneity
  • risk model performance
  • selection
  • verification

Fingerprint

Dive into the research topics of 'Accommodating population differences when validating risk prediction models'. Together they form a unique fingerprint.

Cite this