Detecting Unforeseen Data Properties with Diffusion Autoencoder Embeddings Using Spine MRI Data

Robert Graf, Florian Hunecke, Soeren Pohl, Matan Atad, Hendrik Moeller, Sophie Starck, Thomas Kroencke, Stefanie Bette, Fabian Bamberg, Tobias Pischon, Thoralf Niendorf, Carsten Schmidt, Johannes C. Paetzold, Daniel Rueckert, Jan S. Kirschke

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning has made significant strides in medical imaging, leveraging the use of large datasets to improve diagnostics and prognostics. However, large datasets often come with inherent errors through subject selection and acquisition. In this paper, we investigate the use of Diffusion Autoencoder (DAE) embeddings for uncovering and understanding data characteristics and biases, including biases for protected variables like sex and data abnormalities indicative of unwanted protocol variations. We use sagittal T2-weighted magnetic resonance (MR) images of the neck, chest, and lumbar region from 11186 German National Cohort (NAKO) participants. We compare DAE embeddings with existing generative models like StyleGAN and Variational Autoencoder. Evaluations on a large-scale dataset consisting of sagittal T2-weighted MR images of three spine regions show that DAE embeddings effectively separate protected variables such as sex and age. Furthermore, we used t-SNE visualization to identify unwanted variations in imaging protocols, revealing differences in head positioning. Our embedding can identify samples where a sex predictor will have issues learning the correct sex. Our findings highlight the potential of using advanced embedding techniques like DAEs to detect data quality issues and biases in medical imaging datasets. Identifying such hidden relations can enhance the reliability and fairness of deep learning models in healthcare applications, ultimately improving patient care and outcomes.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2024 Workshops - ISIC 2024, iMIMIC 2024, EARTH 2024, DeCaF 2024, Held in Conjunction with MICCAI 2024, Proceedings
EditorsM. Emre Celebi, Mauricio Reyes, Zhen Chen, Xiaoxiao Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages79-88
Number of pages10
ISBN (Print)9783031776090
DOIs
StatePublished - 2025
Event9th International Skin Imaging Collaboration Workshop, ISIC 2024, 7th International Workshop on Interpretability of Machine Intelligence in Medical Image Computing, iMIMIC 2024, Embodied AI and Robotics for HealTHcare Workshop, EARTH 2024 and 5th MICCAI Workshop on Distributed, Collaborative and Federated Learning, DeCaF 2024 held at 27th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024 - Marrakesh, Morocco
Duration: 6 Oct 202410 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15274 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Skin Imaging Collaboration Workshop, ISIC 2024, 7th International Workshop on Interpretability of Machine Intelligence in Medical Image Computing, iMIMIC 2024, Embodied AI and Robotics for HealTHcare Workshop, EARTH 2024 and 5th MICCAI Workshop on Distributed, Collaborative and Federated Learning, DeCaF 2024 held at 27th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024
Country/TerritoryMorocco
CityMarrakesh
Period6/10/2410/10/24

Keywords

  • Bias detection
  • Data Quality
  • Embeddings
  • Large Cohorts

Fingerprint

Dive into the research topics of 'Detecting Unforeseen Data Properties with Diffusion Autoencoder Embeddings Using Spine MRI Data'. Together they form a unique fingerprint.

Cite this