Human-AI collaboration in large language model-assisted brain MRI differential diagnosis: a usability study

Su Hwan Kim, Jonas Wihl, Severin Schramm, Cornelius Berberich, Enrike Rosenkranz, Lena Schmitzer, Kerem Serguen, Christopher Klenk, Nicolas Lenhart, Claus Zimmer, Benedikt Wiestler, Dennis M. Hedderich

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: This study investigated the impact of human-large language model (LLM) collaboration on the accuracy and efficiency of brain MRI differential diagnosis. Materials and methods: In this retrospective study, forty brain MRI cases with a challenging but definitive diagnosis were randomized into two groups of twenty cases each. Six radiology residents with an average experience of 6.3 months in reading brain MRI exams evaluated one set of cases supported by conventional internet search (Conventional) and the other set utilizing an LLM-based search engine and hybrid chatbot. A cross-over design ensured that each case was examined with both workflows in equal frequency. For each case, readers were instructed to determine the three most likely differential diagnoses. LLM responses were analyzed by a panel of radiologists. Benefits and challenges in human-LLM interaction were derived from observations and participant feedback. Results: LLM-assisted brain MRI differential diagnosis yielded superior accuracy (70/114; 61.4% (LLM-assisted) vs 53/114; 46.5% (conventional) correct diagnoses, p = 0.033, chi-square test). No difference in interpretation time or level of confidence was observed. An analysis of LLM responses revealed that correct LLM suggestions translated into correct reader responses in 82.1% of cases (60/73). Inaccurate case descriptions by readers (9.2% of cases), LLM hallucinations (11.5% of cases), and insufficient contextualization of LLM responses were identified as challenges related to human-LLM interaction. Conclusion: Human-LLM collaboration has the potential to improve brain MRI differential diagnosis. Yet, several challenges must be addressed to ensure effective adoption and user acceptance. Key Points: Question While large language models (LLM) have the potential to support radiological differential diagnosis, the role of human-LLM collaboration in this context remains underexplored. Findings LLM-assisted brain MRI differential diagnosis yielded superior accuracy over conventional internet search. Inaccurate case descriptions, LLM hallucinations, and insufficient contextualization were identified as potential challenges. Clinical relevance Our results highlight the potential of an LLM-assisted workflow to increase diagnostic accuracy but underline the necessity to study collaborative efforts between humans and LLMs over LLMs in isolation.

Original languageEnglish
JournalEuropean Radiology
DOIs
StateAccepted/In press - 2025

Keywords

  • Artificial intelligence
  • Brain
  • Differential diagnosis
  • Large language models
  • Magnetic resonance imaging

Fingerprint

Dive into the research topics of 'Human-AI collaboration in large language model-assisted brain MRI differential diagnosis: a usability study'. Together they form a unique fingerprint.

Cite this