TY - GEN
T1 - L3DG
T2 - 2024 SIGGRAPH Asia 2024 Conference Papers, SA 2024
AU - Roessle, Barbara
AU - Müller, Norman
AU - Porzi, Lorenzo
AU - Bulò, Samuel Rota
AU - Kontschieder, Peter
AU - Dai, Angela
AU - Niessner, Matthias
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/12/3
Y1 - 2024/12/3
N2 - We propose L3DG, the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. This enables effective generative 3D modeling, scaling to generation of entire room-scale scenes which can be very efficiently rendered. To enable effective synthesis of 3D Gaussians, we propose a latent diffusion formulation, operating in a compressed latent space of 3D Gaussians. This compressed latent space is learned by a vector-quantized variational autoencoder (VQ-VAE), for which we employ a sparse convolutional architecture to efficiently operate on room-scale scenes. This way, the complexity of the costly generation process via diffusion is substantially reduced, allowing higher detail on object-level generation, as well as scalability to large scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time. We demonstrate that our approach significantly improves visual quality over prior work on unconditional object-level radiance field synthesis and showcase its applicability to room-scale scene generation.
AB - We propose L3DG, the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. This enables effective generative 3D modeling, scaling to generation of entire room-scale scenes which can be very efficiently rendered. To enable effective synthesis of 3D Gaussians, we propose a latent diffusion formulation, operating in a compressed latent space of 3D Gaussians. This compressed latent space is learned by a vector-quantized variational autoencoder (VQ-VAE), for which we employ a sparse convolutional architecture to efficiently operate on room-scale scenes. This way, the complexity of the costly generation process via diffusion is substantially reduced, allowing higher detail on object-level generation, as well as scalability to large scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time. We demonstrate that our approach significantly improves visual quality over prior work on unconditional object-level radiance field synthesis and showcase its applicability to room-scale scene generation.
KW - 3D gaussian splatting
KW - Generative 3D scene modeling
KW - latent diffusion
UR - http://www.scopus.com/inward/record.url?scp=85216132880&partnerID=8YFLogxK
U2 - 10.1145/3680528.3687699
DO - 10.1145/3680528.3687699
M3 - Conference contribution
AN - SCOPUS:85216132880
T3 - Proceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024
BT - Proceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024
A2 - Spencer, Stephen N.
PB - Association for Computing Machinery, Inc
Y2 - 3 December 2024 through 6 December 2024
ER -