A Multicenter, Scan-Rescan, Human and Machine Learning CMR Study to Test Generalizability and Precision in Imaging Biomarker Analysis

Anish N. Bhuva, Wenjia Bai, Clement Lau, Rhodri H. Davies, Yang Ye, Heeraj Bulluck, Elisa McAlindon, Veronica Culotta, Peter P. Swoboda, Gabriella Captur, Thomas A. Treibel, Joao B. Augusto, Kristopher D. Knott, Andreas Seraphim, Graham D. Cole, Steffen E. Petersen, Nicola C. Edwards, John P. Greenwood, Chiara Bucciarelli-Ducci, Alun D. HughesDaniel Rueckert, James C. Moon, Charlotte H. Manisty

Research output: Contribution to journalArticlepeer-review

79 Scopus citations

Abstract

Background: Automated analysis of cardiac structure and function using machine learning (ML) has great potential, but is currently hindered by poor generalizability. Comparison is traditionally against clinicians as a reference, ignoring inherent human inter-and intraobserver error, and ensuring that ML cannot demonstrate superiority. Measuring precision (scan:rescan reproducibility) addresses this. We compared precision of ML and humans using a multicenter, multi-disease, scan:rescan cardiovascular magnetic resonance data set. Methods: One hundred ten patients (5 disease categories, 5 institutions, 2 scanner manufacturers, and 2 field strengths) underwent scan:rescan cardiovascular magnetic resonance (96% within one week). After identification of the most precise human technique, left ventricular chamber volumes, mass, and ejection fraction were measured by an expert, a trained junior clinician, and a fully automated convolutional neural network trained on 599 independent multicenter disease cases. Scan:rescan coefficient of variation and 1000 bootstrapped 95% CIs were calculated and compared using mixed linear effects models. Results: Clinicians can be confident in detecting a 9% change in left ventricular ejection fraction, with greater than half of coefficient of variation attributable to intraobserver variation. Expert, trained junior, and automated scan:rescan precision were similar (for left ventricular ejection fraction, coefficient of variation 6.1 [5.2%-7.1%], P=0.2581; 8.3 [5.6%-10.3%], P=0.3653; 8.8 [6.1%-11.1%], P=0.8620). Automated analysis was 186× faster than humans (0.07 versus 13 minutes). Conclusions: Automated ML analysis is faster with similar precision to the most precise human techniques, even when challenged with real-world scan:rescan data. Assessment of multicenter, multi-vendor, multi-field strength scan:rescan data (available at www.thevolumesresource.com) permits a generalizable assessment of ML precision and may facilitate direct translation of ML to clinical practice.

Original languageEnglish
Article numbere009214
JournalCirculation: Cardiovascular Imaging
Volume12
Issue number10
DOIs
StatePublished - 1 Oct 2019
Externally publishedYes

Keywords

  • artificial intelligence
  • image processing
  • left ventricular remodeling
  • magnetic resonance imaging, cine
  • ventricular function

Fingerprint

Dive into the research topics of 'A Multicenter, Scan-Rescan, Human and Machine Learning CMR Study to Test Generalizability and Precision in Imaging Biomarker Analysis'. Together they form a unique fingerprint.

Cite this