TY - JOUR
T1 - DiffCAD
T2 - Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image
AU - Gao, Daoyi
AU - Rozenberszki, David
AU - Leutenegger, Stefan
AU - Dai, Angela
N1 - Publisher Copyright:
Copyright © 2024 is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/7/19
Y1 - 2024/7/19
N2 - Perceiving 3D structures from RGB images based on CAD model primitives can enable an effective, efficient 3D object-based representation of scenes. However, current approaches rely on supervision from expensive yet imperfect annotations of CAD models associated with real images, and encounter challenges due to the inherent ambiguities in the task - both in depth-scale ambiguity in monocular perception, as well as inexact matches of CAD database models to real observations. We thus propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image. We learn a probabilistic model through diffusion, modeling likely distributions of shape, pose, and scale of CAD objects in an image. This enables multi-hypothesis generation of different plausible CAD reconstructions, requiring only a few hypotheses to characterize ambiguities in depth/scale and inexact shape matches. Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains. Despite being trained solely on synthetic data, our multi-hypothesis approach can even surpass the supervised state-of-the-art on the Scan2CAD dataset by 5.9% with 8 hypotheses.
AB - Perceiving 3D structures from RGB images based on CAD model primitives can enable an effective, efficient 3D object-based representation of scenes. However, current approaches rely on supervision from expensive yet imperfect annotations of CAD models associated with real images, and encounter challenges due to the inherent ambiguities in the task - both in depth-scale ambiguity in monocular perception, as well as inexact matches of CAD database models to real observations. We thus propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image. We learn a probabilistic model through diffusion, modeling likely distributions of shape, pose, and scale of CAD objects in an image. This enables multi-hypothesis generation of different plausible CAD reconstructions, requiring only a few hypotheses to characterize ambiguities in depth/scale and inexact shape matches. Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains. Despite being trained solely on synthetic data, our multi-hypothesis approach can even surpass the supervised state-of-the-art on the Scan2CAD dataset by 5.9% with 8 hypotheses.
KW - 3D reconstruction from a single image
KW - cad model retrieval and alignment
KW - weak supervision
UR - http://www.scopus.com/inward/record.url?scp=85199309724&partnerID=8YFLogxK
U2 - 10.1145/3658236
DO - 10.1145/3658236
M3 - Article
AN - SCOPUS:85199309724
SN - 0730-0301
VL - 43
JO - ACM Transactions on Graphics
JF - ACM Transactions on Graphics
IS - 4
M1 - 106
ER -