Improving Self-Supervised Learning of Transparent Category Poses with Language Guidance and Implicit Physical Constraints

Pengyuan Wang, Lorenzo Garattoni, Sven Meier, Nassir Navab, Benjamin Busam

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate object pose estimation is crucial for robotic applications and recent trends in category-level pose estimation show great potential for applications encountering a large variety of similar objects, often encountered in home environments. While common in such environments, photometrically challenging objects with transparency such as glasses are poorly handled by current methods. Especially using self-supervision to bridge the sim2real domain gap is difficult for transparent objects due to strong background changes and depth artifacts. To address this, we propose a novel pipeline which takes language guidance and implicit physical constraints for 2D and 3D self-supervisions. In specific, we utilize language guidance to obtain accurate 2D object segmentation which is robust to background changes. Further 3D self-supervisions are achieved by contact constraint and normal constraint from polarization inputs with a differentiable renderer. Instead of explicitly leveraging the depth measurements, we reason about implicit physical constraints for self-supervisions. Extensive experiments superior performance of our self-supervision approach over baselines on both the self-collected dataset and public benchmarks, addressing photometric challenges. Project page: <uri>https://seasandwpy.github.io/trans/</uri>

Original languageEnglish
Pages (from-to)1-8
Number of pages8
JournalIEEE Robotics and Automation Letters
DOIs
StateAccepted/In press - 2024

Keywords

  • Accuracy
  • Encoding
  • Object Detection
  • Perception for Grasping and Manipulation
  • Pipelines
  • Pose estimation
  • RGB-D Perception
  • Segmentation and Categorization
  • Shape
  • Three-dimensional displays
  • Training

Fingerprint

Dive into the research topics of 'Improving Self-Supervised Learning of Transparent Category Poses with Language Guidance and Implicit Physical Constraints'. Together they form a unique fingerprint.

Cite this