Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia, Letian Shi, Zifeng Ding, João F. Henriques, Daniel Cremers

Research output: Contribution to journalConference articlepeer-review

17 Scopus citations

Abstract

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2× over the state-of-the-art on the KITTI360Pose dataset. Our project page is publicly available at https://yan-xia.github.io/projects/text2loc/.

Original languageEnglish
Pages (from-to)14958-14967
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Keywords

  • 3D localization
  • autonomous driving
  • point cloud
  • text

Fingerprint

Dive into the research topics of 'Text2Loc: 3D Point Cloud Localization from Natural Language'. Together they form a unique fingerprint.

Cite this