Abstract
Validation of risk prediction models in independent data provides a more rigorous assessment of model performance than internal assessment, for example, done by cross-validation in the data used for model development. However, several differences between the populations that gave rise to the training and the validation data can lead to seemingly poor performance of a risk model. In this paper we formalize the notions of “similarity” or “relatedness” of the training and validation data, and define reproducibility and transportability. We address the impact of different distributions of model predictors and differences in verifying the disease status or outcome on measures of calibration, accuracy and discrimination of a model. When individual level information from both the training and validation data sets is available, we propose and study weighted versions of the validation metrics that adjust for differences in the risk factor distributions and in outcome verification between the training and validation data to provide a more comprehensive assessment of model performance. We provide conditions on the risk model and the populations that gave rise to the training and validation data that ensure a model's reproducibility or transportability, and show how to check these conditions using weighted and unweighted performance measures. We illustrate the method by developing and validating a model that predicts the risk of developing prostate cancer using data from two large prostate cancer screening trials.
Original language | English |
---|---|
Pages (from-to) | 4756-4780 |
Number of pages | 25 |
Journal | Statistics in Medicine |
Volume | 41 |
Issue number | 24 |
DOIs | |
State | Published - 30 Oct 2022 |
Keywords
- population differences
- risk factor heterogeneity
- risk model performance
- selection
- verification