H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration

Morteza Ghahremani, Mohammad Khateri, Bailiang Jian, Benedikt Wiestler, Ehsan Adeli, Christian Wachinger

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

This paper introduces a novel top-down representation approach for deformable image registration, which estimates the deformation field by capturing various short-and long-range flow features at different scale levels. As a Hierarchical Vision Transformer (H- ViT), we propose a dual self-attention and cross-attention mechanism that uses high-level features in the deformation field to represent low-level ones, enabling information streams in the deformation field across all voxel patch embeddings irrespective of their spatial proximity. Since high-level features contain abstract flow patterns, such patterns are expected to effectively contribute to the representation of the deformation field in lower scales. When the self-attention module utilizes within-scale short-range patterns for representation, the cross-attention modules dynamically look for the key tokens across different scales to further interact with the local query voxel patches. Our method shows superior accuracy and visual quality over the state-of-the-art registration methods in five publicly available datasets, highlighting a substantial enhancement in the performance of medical imaging registration. The project link is available at https://mogvision.github.io/hvit.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages11513-11523
Number of pages11
ISBN (Electronic)9798350353006
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Keywords

  • Deformable Image Registration
  • Hierarchical cross-attention
  • Medical Imaging
  • Vision Transformer

Fingerprint

Dive into the research topics of 'H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration'. Together they form a unique fingerprint.

Cite this