TY - GEN
T1 - Scan2Cap
T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
AU - Chen, Dave Zhenyu
AU - Gholami, Ali
AU - Nießner, Matthias
AU - Chang, Angel X.
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects. To address the 3D object detection and description problems, we propose Scan2Cap, an end-to-end trained method, to detect objects in the input scene and describe them in natural language. We use an attention mechanism that generates descriptive tokens while referring to the related components in the local context. To reflect object relations (i.e. relative spatial relations) in the generated captions, we use a message passing graph module to facilitate learning object relation features. Our method can effectively localize and describe 3D objects in scenes from the ScanRefer dataset, outperforming 2D baseline methods by a significant margin (27.61% [email protected] improvement).
AB - We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects. To address the 3D object detection and description problems, we propose Scan2Cap, an end-to-end trained method, to detect objects in the input scene and describe them in natural language. We use an attention mechanism that generates descriptive tokens while referring to the related components in the local context. To reflect object relations (i.e. relative spatial relations) in the generated captions, we use a message passing graph module to facilitate learning object relation features. Our method can effectively localize and describe 3D objects in scenes from the ScanRefer dataset, outperforming 2D baseline methods by a significant margin (27.61% [email protected] improvement).
UR - http://www.scopus.com/inward/record.url?scp=85106607566&partnerID=8YFLogxK
U2 - 10.1109/CVPR46437.2021.00321
DO - 10.1109/CVPR46437.2021.00321
M3 - Conference contribution
AN - SCOPUS:85106607566
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 3192
EP - 3202
BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
PB - IEEE Computer Society
Y2 - 19 June 2021 through 25 June 2021
ER -