TY - JOUR
T1 - Reverse Classification Accuracy
T2 - Predicting Segmentation Performance in the Absence of Ground Truth
AU - Valindria, Vanya V.
AU - Lavdas, Ioannis
AU - Bai, Wenjia
AU - Kamnitsas, Konstantinos
AU - Aboagye, Eric O.
AU - Rockall, Andrea G.
AU - Rueckert, Daniel
AU - Glocker, Ben
N1 - Funding Information:
Manuscript received December 15, 2016; revised January 25, 2017; accepted January 28, 2017. Date of publication April 17, 2017; date of current version July 30, 2017. This work was supported in part by the National Institute for Health Research under EME Project 13/122/01 and in part by the EPSRC First Grant Scheme under Grant EP/N023668/1. The work of V. V. Valindria was supported by the Indonesia Endowment for Education (LPDP)–Indonesian Presidential Ph.D. Scholarship Programme. The work of K. Kamnitsas was supported by the Imperial College President’s Ph.D. Scholarship Programme. Asterisk indicates corresponding author.
PY - 2017/8
Y1 - 2017/8
N2 - When integrating computational tools, such as automatic segmentation, into clinical practice, it is of utmost importance to be able to assess the level of accuracy on new data and, in particular, to detect when an automatic method fails. However, this is difficult to achieve due to the absence of ground truth. Segmentation accuracy on clinical data might be different from what is found through cross validation, because validation data are often used during incremental method development, which can lead to overfitting and unrealistic performance expectations. Before deployment, performance is quantified using different metrics, for which the predicted segmentation is compared with a reference segmentation, often obtained manually by an expert. But little is known about the real performance after deployment when a reference is unavailable. In this paper, we introduce the concept of reverse classification accuracy (RCA) as a framework for predicting the performance of a segmentation method on new data. In RCA, we take the predicted segmentation from a new image to train a reverse classifier, which is evaluated on a set of reference images with available ground truth. The hypothesis is that if the predicted segmentation is of good quality, then the reverse classifier will perform well on at least some of the reference images. We validate our approach on multi-organ segmentation with different classifiers and segmentation methods. Our results indicate that it is indeed possible to predict the quality of individual segmentations, in the absence of ground truth. Thus, RCA is ideal for integration into automatic processing pipelines in clinical routine and as a part of large-scale image analysis studies.
AB - When integrating computational tools, such as automatic segmentation, into clinical practice, it is of utmost importance to be able to assess the level of accuracy on new data and, in particular, to detect when an automatic method fails. However, this is difficult to achieve due to the absence of ground truth. Segmentation accuracy on clinical data might be different from what is found through cross validation, because validation data are often used during incremental method development, which can lead to overfitting and unrealistic performance expectations. Before deployment, performance is quantified using different metrics, for which the predicted segmentation is compared with a reference segmentation, often obtained manually by an expert. But little is known about the real performance after deployment when a reference is unavailable. In this paper, we introduce the concept of reverse classification accuracy (RCA) as a framework for predicting the performance of a segmentation method on new data. In RCA, we take the predicted segmentation from a new image to train a reverse classifier, which is evaluated on a set of reference images with available ground truth. The hypothesis is that if the predicted segmentation is of good quality, then the reverse classifier will perform well on at least some of the reference images. We validate our approach on multi-organ segmentation with different classifiers and segmentation methods. Our results indicate that it is indeed possible to predict the quality of individual segmentations, in the absence of ground truth. Thus, RCA is ideal for integration into automatic processing pipelines in clinical routine and as a part of large-scale image analysis studies.
KW - Abdominal
KW - MRI
KW - classification
KW - image segmentation
KW - machine learning
KW - performance evaluation
UR - http://www.scopus.com/inward/record.url?scp=85029389435&partnerID=8YFLogxK
U2 - 10.1109/TMI.2017.2665165
DO - 10.1109/TMI.2017.2665165
M3 - Article
C2 - 28436849
AN - SCOPUS:85029389435
SN - 0278-0062
VL - 36
SP - 1597
EP - 1606
JO - IEEE Transactions on Medical Imaging
JF - IEEE Transactions on Medical Imaging
IS - 8
M1 - 7902121
ER -