TY - JOUR
T1 - Making complex prediction rules applicable for readers
T2 - Current practice in random forest literature and recommendations
AU - Boulesteix, Anne Laure
AU - Janitza, Silke
AU - Hornung, Roman
AU - Probst, Philipp
AU - Busen, Hannah
AU - Hapfelmeier, Alexander
N1 - Publisher Copyright:
© 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
PY - 2019/9/1
Y1 - 2019/9/1
N2 - Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014–2015 in the field “medical and health sciences”, with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.
AB - Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014–2015 in the field “medical and health sciences”, with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.
KW - logistic regression
KW - machine learning
KW - prediction rule
KW - reproducibility
KW - reproducible research
UR - http://www.scopus.com/inward/record.url?scp=85052448458&partnerID=8YFLogxK
U2 - 10.1002/bimj.201700243
DO - 10.1002/bimj.201700243
M3 - Article
C2 - 30069934
AN - SCOPUS:85052448458
SN - 0323-3847
VL - 61
SP - 1314
EP - 1328
JO - Biometrical Journal
JF - Biometrical Journal
IS - 5
ER -