TY - JOUR
T1 - Learning 3D Semantic Scene Graphs with Instance Embeddings
AU - Wald, Johanna
AU - Navab, Nassir
AU - Tombari, Federico
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/3
Y1 - 2022/3
N2 - A 3D scene is more than the geometry and classes of the objects it comprises. An essential aspect beyond object-level perception is the scene context, described as a dense semantic network of interconnected nodes. Scene graphs have become a common representation to encode the semantic richness of images, where nodes in the graph are object entities connected by edges, so-called relationships. Such graphs have been shown to be useful in achieving state-of-the-art performance in image captioning, visual question answering and image generation or editing. While scene graph prediction methods so far focused on images, we propose instead a novel neural network architecture for 3D data, where the aim is to learn to regress semantic graphs from a given 3D scene. With this work, we go beyond object-level perception, by exploring relations between object entities. Our method learns instance embeddings alongside a scene segmentation and is able to predict semantics for object nodes and edges. We leverage 3DSSG, a large scale dataset based on 3RScan that features scene graphs of changing 3D scenes. Finally, we show the effectiveness of graphs as an intermediate representation on a retrieval task.
AB - A 3D scene is more than the geometry and classes of the objects it comprises. An essential aspect beyond object-level perception is the scene context, described as a dense semantic network of interconnected nodes. Scene graphs have become a common representation to encode the semantic richness of images, where nodes in the graph are object entities connected by edges, so-called relationships. Such graphs have been shown to be useful in achieving state-of-the-art performance in image captioning, visual question answering and image generation or editing. While scene graph prediction methods so far focused on images, we propose instead a novel neural network architecture for 3D data, where the aim is to learn to regress semantic graphs from a given 3D scene. With this work, we go beyond object-level perception, by exploring relations between object entities. Our method learns instance embeddings alongside a scene segmentation and is able to predict semantics for object nodes and edges. We leverage 3DSSG, a large scale dataset based on 3RScan that features scene graphs of changing 3D scenes. Finally, we show the effectiveness of graphs as an intermediate representation on a retrieval task.
KW - 3D scene understanding
KW - Scene graphs
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85123473145&partnerID=8YFLogxK
U2 - 10.1007/s11263-021-01546-9
DO - 10.1007/s11263-021-01546-9
M3 - Article
AN - SCOPUS:85123473145
SN - 0920-5691
VL - 130
SP - 630
EP - 651
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 3
ER -