TY - GEN
T1 - The Pitfalls of Sample Selection
T2 - 4th International Workshop on Predictive Intelligence in Medicine, PRIME 2021, held in conjunction with 24th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2021
AU - Baltatzis, Vasileios
AU - Bintsi, Kyriaki Margarita
AU - Folgoc, Loïc Le
AU - Martinez Manzanera, Octavio E.
AU - Ellis, Sam
AU - Nair, Arjun
AU - Desai, Sujal
AU - Glocker, Ben
AU - Schnabel, Julia A.
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Using publicly available data to determine the performance of methodological contributions is important as it facilitates reproducibility and allows scrutiny of the published results. In lung nodule classification, for example, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions. When analyzing seven recent works, however, we find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases. As each subset will have different characteristics with varying difficulty for classification, a direct comparison between the proposed methods is thus not always possible, nor fair. We study the particular effect of truthing when aggregating labels from multiple experts. We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another. While we show that we can further improve on the state-of-the-art on one sample selection, we also find that on a more challenging sample selection, on the same database, the more advanced models underperform with respect to very simple baseline methods, highlighting that the selected data distribution may play an even more important role than the model architecture. This raises concerns about the validity of claimed methodological contributions. We believe the community should be aware of these pitfalls and make recommendations on how these can be avoided in future work.
AB - Using publicly available data to determine the performance of methodological contributions is important as it facilitates reproducibility and allows scrutiny of the published results. In lung nodule classification, for example, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions. When analyzing seven recent works, however, we find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases. As each subset will have different characteristics with varying difficulty for classification, a direct comparison between the proposed methods is thus not always possible, nor fair. We study the particular effect of truthing when aggregating labels from multiple experts. We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another. While we show that we can further improve on the state-of-the-art on one sample selection, we also find that on a more challenging sample selection, on the same database, the more advanced models underperform with respect to very simple baseline methods, highlighting that the selected data distribution may play an even more important role than the model architecture. This raises concerns about the validity of claimed methodological contributions. We believe the community should be aware of these pitfalls and make recommendations on how these can be avoided in future work.
UR - http://www.scopus.com/inward/record.url?scp=85116835876&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-87602-9_19
DO - 10.1007/978-3-030-87602-9_19
M3 - Conference contribution
AN - SCOPUS:85116835876
SN - 9783030876012
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 201
EP - 211
BT - Predictive Intelligence in Medicine - 4th International Workshop, PRIME 2021, Held in Conjunction with MICCAI 2021, Proceedings
A2 - Rekik, Islem
A2 - Adeli, Ehsan
A2 - Park, Sang Hyun
A2 - Schnabel, Julia
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 1 October 2021 through 1 October 2021
ER -