Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors

Claudio E. von Schacky, Nikolas J. Wilhelm, Valerie S. Schäfer, Yannik Leonhardt, Matthias Jung, Pia M. Jungmann, Maximilian F. Russe, Sarah C. Foreman, Felix G. Gassert, Florian T. Gassert, Benedikt J. Schwaiger, Carolin Mogler, Carolin Knebel, Ruediger von Eisenhart-Rothe, Marcus R. Makowski, Klaus Woertler, Rainer Burgkart, Alexandra S. Gersing

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Objectives: To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists. Methods: In 880 patients (age 33.1 ± 19.4 years, 395 women) diagnosed with malignant (n = 213, 24.2%) or benign (n = 667, 75.8%) primary bone tumors, preoperative radiographs were obtained, and the diagnosis was established using histopathology. Data was split 70%/15%/15% for training, validation, and internal testing. Additionally, 96 patients from another institution were obtained for external testing. Machine learning models were developed and validated using radiomic features and demographic information. The performance of each model was evaluated on the test sets for accuracy, area under the curve (AUC) from receiver operating characteristics, sensitivity, and specificity. For comparison, the external test set was evaluated by two radiology residents and two radiologists who specialized in musculoskeletal tumor imaging. Results: The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively. Conclusions: An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents. Key Points: • The developed machine learning model could differentiate benign from malignant bone tumors using radiography with an AUC of 0.90 on the external test set. • Machine learning models that used radiomic features or demographic information alone performed worse than those that used both radiomic features and demographic information as input, highlighting the importance of building comprehensive machine learning models. • An artificial neural network that combined both radiomic and demographic information achieved the best performance and its performance was compared to radiology readers on an external test set.

Original languageEnglish
Pages (from-to)6247-6257
Number of pages11
JournalEuropean Radiology
Volume32
Issue number9
DOIs
StatePublished - Sep 2022
Externally publishedYes

Keywords

  • Bone neoplasms
  • Diagnostic imaging
  • Machine learning
  • Musculoskeletal system
  • Radiography

Fingerprint

Dive into the research topics of 'Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors'. Together they form a unique fingerprint.

Cite this