TY - JOUR
T1 - A Multicenter, Scan-Rescan, Human and Machine Learning CMR Study to Test Generalizability and Precision in Imaging Biomarker Analysis
AU - Bhuva, Anish N.
AU - Bai, Wenjia
AU - Lau, Clement
AU - Davies, Rhodri H.
AU - Ye, Yang
AU - Bulluck, Heeraj
AU - McAlindon, Elisa
AU - Culotta, Veronica
AU - Swoboda, Peter P.
AU - Captur, Gabriella
AU - Treibel, Thomas A.
AU - Augusto, Joao B.
AU - Knott, Kristopher D.
AU - Seraphim, Andreas
AU - Cole, Graham D.
AU - Petersen, Steffen E.
AU - Edwards, Nicola C.
AU - Greenwood, John P.
AU - Bucciarelli-Ducci, Chiara
AU - Hughes, Alun D.
AU - Rueckert, Daniel
AU - Moon, James C.
AU - Manisty, Charlotte H.
N1 - Publisher Copyright:
© 2019 American Heart Association, Inc.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Background: Automated analysis of cardiac structure and function using machine learning (ML) has great potential, but is currently hindered by poor generalizability. Comparison is traditionally against clinicians as a reference, ignoring inherent human inter-and intraobserver error, and ensuring that ML cannot demonstrate superiority. Measuring precision (scan:rescan reproducibility) addresses this. We compared precision of ML and humans using a multicenter, multi-disease, scan:rescan cardiovascular magnetic resonance data set. Methods: One hundred ten patients (5 disease categories, 5 institutions, 2 scanner manufacturers, and 2 field strengths) underwent scan:rescan cardiovascular magnetic resonance (96% within one week). After identification of the most precise human technique, left ventricular chamber volumes, mass, and ejection fraction were measured by an expert, a trained junior clinician, and a fully automated convolutional neural network trained on 599 independent multicenter disease cases. Scan:rescan coefficient of variation and 1000 bootstrapped 95% CIs were calculated and compared using mixed linear effects models. Results: Clinicians can be confident in detecting a 9% change in left ventricular ejection fraction, with greater than half of coefficient of variation attributable to intraobserver variation. Expert, trained junior, and automated scan:rescan precision were similar (for left ventricular ejection fraction, coefficient of variation 6.1 [5.2%-7.1%], P=0.2581; 8.3 [5.6%-10.3%], P=0.3653; 8.8 [6.1%-11.1%], P=0.8620). Automated analysis was 186× faster than humans (0.07 versus 13 minutes). Conclusions: Automated ML analysis is faster with similar precision to the most precise human techniques, even when challenged with real-world scan:rescan data. Assessment of multicenter, multi-vendor, multi-field strength scan:rescan data (available at www.thevolumesresource.com) permits a generalizable assessment of ML precision and may facilitate direct translation of ML to clinical practice.
AB - Background: Automated analysis of cardiac structure and function using machine learning (ML) has great potential, but is currently hindered by poor generalizability. Comparison is traditionally against clinicians as a reference, ignoring inherent human inter-and intraobserver error, and ensuring that ML cannot demonstrate superiority. Measuring precision (scan:rescan reproducibility) addresses this. We compared precision of ML and humans using a multicenter, multi-disease, scan:rescan cardiovascular magnetic resonance data set. Methods: One hundred ten patients (5 disease categories, 5 institutions, 2 scanner manufacturers, and 2 field strengths) underwent scan:rescan cardiovascular magnetic resonance (96% within one week). After identification of the most precise human technique, left ventricular chamber volumes, mass, and ejection fraction were measured by an expert, a trained junior clinician, and a fully automated convolutional neural network trained on 599 independent multicenter disease cases. Scan:rescan coefficient of variation and 1000 bootstrapped 95% CIs were calculated and compared using mixed linear effects models. Results: Clinicians can be confident in detecting a 9% change in left ventricular ejection fraction, with greater than half of coefficient of variation attributable to intraobserver variation. Expert, trained junior, and automated scan:rescan precision were similar (for left ventricular ejection fraction, coefficient of variation 6.1 [5.2%-7.1%], P=0.2581; 8.3 [5.6%-10.3%], P=0.3653; 8.8 [6.1%-11.1%], P=0.8620). Automated analysis was 186× faster than humans (0.07 versus 13 minutes). Conclusions: Automated ML analysis is faster with similar precision to the most precise human techniques, even when challenged with real-world scan:rescan data. Assessment of multicenter, multi-vendor, multi-field strength scan:rescan data (available at www.thevolumesresource.com) permits a generalizable assessment of ML precision and may facilitate direct translation of ML to clinical practice.
KW - artificial intelligence
KW - image processing
KW - left ventricular remodeling
KW - magnetic resonance imaging, cine
KW - ventricular function
UR - http://www.scopus.com/inward/record.url?scp=85072566465&partnerID=8YFLogxK
U2 - 10.1161/CIRCIMAGING.119.009214
DO - 10.1161/CIRCIMAGING.119.009214
M3 - Article
C2 - 31547689
AN - SCOPUS:85072566465
SN - 1941-9651
VL - 12
JO - Circulation: Cardiovascular Imaging
JF - Circulation: Cardiovascular Imaging
IS - 10
M1 - e009214
ER -