Vision Transformers Enable Fast and Robust Accelerated MRI

Kang Lin, Reinhard Heckel

Research output: Contribution to journalConference articlepeer-review

29 Scopus citations

Abstract

The Vision Transformer, when trained or pre-trained on datasets consisting of millions of images, gives excellent accuracy for image classification tasks and offers computational savings relative to convolutional neural networks. Motivated by potential accuracy gains and computational savings, we study Vision Transformers for accelerated magnetic resonance image reconstruction. We show that, when trained on the fastMRI dataset, a popular dataset for accelerated MRI only consisting of thousands of images, a Vision Transformer tailored to image reconstruction yields on par reconstruction accuracy with the U-net while enjoying higher throughput and less memory consumption. Furthermore, as Transformers are known to perform best with large-scale pre-training, but MRI data is costly to obtain, we propose a simple yet effective pre-training, which solely relies on big natural image datasets, such as ImageNet. We show that pre-training the Vision Transformer drastically improves training data efficiency for accelerated MRI, and increases robustness towards anatomy shifts. In the regime where only 100 MRI training images are available, the pre-trained Vision Transformer achieves significantly better image quality than pre-trained convolutional networks and the current state-of-the-art. Our code is available at https://github.com/MLI-lab/transformers_for_imaging.

Original languageEnglish
Pages (from-to)774-795
Number of pages22
JournalProceedings of Machine Learning Research
Volume172
StatePublished - 2022
Event5th International Conference on Medical Imaging with Deep Learning, MIDL 2022 - Zurich, Switzerland
Duration: 6 Jul 20228 Jul 2022

Keywords

  • Accelerated MRI
  • Transformer
  • image reconstruction
  • pre-training

Fingerprint

Dive into the research topics of 'Vision Transformers Enable Fast and Robust Accelerated MRI'. Together they form a unique fingerprint.

Cite this