VXP: Voxel-Cross-Pixel Large-Scale Camera-LiDAR Place Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Cross-modal place recognition methods are flexible GPS-alternatives under varying environment conditions and sensor setups. However, this task is non-trivial since extracting consistent and robust global descriptors from different modalities is challenging. To tackle this issue, we propose Voxel-Cross-Pixel (VXP), a novel camera-to-LiDAR place recognition framework that enforces local similarities in a self-supervised manner and effectively brings global context from images and LiDAR scans into a shared feature space. Specifically, VXP is trained in three stages: first, we deploy a visual transformer to compactly represent input images. Secondly, we establish local correspondences between image-based and point cloud-based feature spaces using our novel geometric alignment module. We then aggregate local similarities into an expressive shared latent space. Extensive experiments on the three benchmarks (Oxford RobotCar, ViViD++ and KITTI) demonstrate that our method surpasses the state-of-the-art cross-modal retrieval by a large margin. Our evaluations show that the proposed method is accurate, efficient and light-weight. Our project page is available at: https://yunjinli.github.io/projects-vxp/.

Original languageEnglish
Title of host publicationProceedings - 2025 International Conference on 3D Vision, 3DV 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1233-1242
Number of pages10
ISBN (Electronic)9798331538514
DOIs
StatePublished - 2025
Event12th International Conference on 3D Vision, 3DV 2025 - Singapore, Singapore
Duration: 25 Mar 202528 Mar 2025

Publication series

NameProceedings - 2025 International Conference on 3D Vision, 3DV 2025

Conference

Conference12th International Conference on 3D Vision, 3DV 2025
Country/TerritorySingapore
CitySingapore
Period25/03/2528/03/25

Keywords

  • autonomous driving
  • cross-modal retrieval
  • foundation models
  • place recognition

Fingerprint

Dive into the research topics of 'VXP: Voxel-Cross-Pixel Large-Scale Camera-LiDAR Place Recognition'. Together they form a unique fingerprint.

Cite this