TY - JOUR
T1 - Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
AU - Laimighofer, Michael
AU - Krumsiek, Jan
AU - Buettner, Florian
AU - Theis, Fabian J.
N1 - Publisher Copyright:
© 2016 Michael Laimighofer, et al.
PY - 2016/4/1
Y1 - 2016/4/1
N2 - With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN.
AB - With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN.
KW - feature selection
KW - high-dimensional survival regression
KW - repeated nested cross validation
UR - http://www.scopus.com/inward/record.url?scp=84964252998&partnerID=8YFLogxK
U2 - 10.1089/cmb.2015.0192
DO - 10.1089/cmb.2015.0192
M3 - Article
C2 - 26894327
AN - SCOPUS:84964252998
SN - 1066-5277
VL - 23
SP - 279
EP - 290
JO - Journal of computational biology : a journal of computational molecular cell biology
JF - Journal of computational biology : a journal of computational molecular cell biology
IS - 4
ER -