TY - GEN
T1 - 4DContrast
T2 - 17th European Conference on Computer Vision, ECCV 2022
AU - Chen, Yujin
AU - Nießner, Matthias
AU - Dai, Angela
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training. We observe that dynamic movement of an object through an environment provides important cues about its objectness, and thus propose to imbue learned 3D representations with such dynamic understanding, that can then be effectively transferred to improved performance in downstream 3D semantic scene understanding tasks. We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments, and employ contrastive learning under 3D-4D constraints that encode 4D invariances into the learned 3D representations. Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks, and moreover, notably improves performance in data-scarce scenarios. Our results show that our 4D pre-training method improves downstream tasks such as object detection [email protected] by 5.5%/6.5% over training from scratch on ScanNet/SUN RGB-D while involving no additional run-time overhead at test time.
AB - We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training. We observe that dynamic movement of an object through an environment provides important cues about its objectness, and thus propose to imbue learned 3D representations with such dynamic understanding, that can then be effectively transferred to improved performance in downstream 3D semantic scene understanding tasks. We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments, and employ contrastive learning under 3D-4D constraints that encode 4D invariances into the learned 3D representations. Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks, and moreover, notably improves performance in data-scarce scenarios. Our results show that our 4D pre-training method improves downstream tasks such as object detection [email protected] by 5.5%/6.5% over training from scratch on ScanNet/SUN RGB-D while involving no additional run-time overhead at test time.
KW - 3D instance segmentation
KW - 3D object detection
KW - 3D scene understanding
KW - 3D semantic segmentation
KW - Point cloud recognition
UR - http://www.scopus.com/inward/record.url?scp=85144550536&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-19824-3_32
DO - 10.1007/978-3-031-19824-3_32
M3 - Conference contribution
AN - SCOPUS:85144550536
SN - 9783031198236
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 543
EP - 560
BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 October 2022 through 27 October 2022
ER -