Abstract
Accurate object pose estimation is crucial for robotic applications and recent trends in category-level pose estimation show great potential for applications encountering a large variety of similar objects, often encountered in home environments. While common in such environments, photometrically challenging objects with transparency such as glasses are poorly handled by current methods. Especially using self-supervision to bridge the sim2real domain gap is difficult for transparent objects due to strong background changes and depth artifacts. To address this, we propose a novel pipeline which takes language guidance and implicit physical constraints for 2D and 3D self-supervisions. In specific, we utilize language guidance to obtain accurate 2D object segmentation which is robust to background changes. Further 3D self-supervisions are achieved by contact constraint and normal constraint from polarization inputs with a differentiable renderer. Instead of explicitly leveraging the depth measurements, we reason about implicit physical constraints for self-supervisions. Extensive experiments superior performance of our self-supervision approach over baselines on both the self-collected dataset and public benchmarks, addressing photometric challenges. Project page: <uri>https://seasandwpy.github.io/trans/</uri>
Original language | English |
---|---|
Pages (from-to) | 1-8 |
Number of pages | 8 |
Journal | IEEE Robotics and Automation Letters |
DOIs | |
State | Accepted/In press - 2024 |
Keywords
- Accuracy
- Encoding
- Object Detection
- Perception for Grasping and Manipulation
- Pipelines
- Pose estimation
- RGB-D Perception
- Segmentation and Categorization
- Shape
- Three-dimensional displays
- Training