Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

  • Lukas Höllein
  • , Aljaž Božić
  • , Norman Müller
  • , David Novotny
  • , Hung Yu Tseng
  • , Christian Richardt
  • , Michael Zollhöfer
  • , Matthias Nießner
  • Technische Universität München
  • Facebook

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

37 Zitate (Scopus)

Abstract

3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data, which often results in non-photorealistic 3D objects without backgrounds. In this paper, we present a method that leverages pretrained text-to-image models as a prior, and learn to generate multi-view images in a single denoising process from real-world data. Concretely, we propose to integrate 3D volume-rendering and cross-frame-attention layers into each block of the existing U-Net network of the text-to-image model. Moreover, we design an autoregressive generation that renders more 3D-consistent images at any viewpoint. We train our model on real-world datasets of objects and showcase its capabilities to generate instances with a variety of high-quality shapes and textures in authentic surroundings. Compared to the existing methods, the results generated by our method are consistent, and have favorable visual quality (-30% FID, -37% KID).

OriginalspracheEnglisch
TitelProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Herausgeber (Verlag)IEEE Computer Society
Seiten5043-5052
Seitenumfang10
ISBN (elektronisch)9798350353006
DOIs
PublikationsstatusVeröffentlicht - 2024
Veranstaltung2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, USA/Vereinigte Staaten
Dauer: 16 Juni 202422 Juni 2024

Publikationsreihe

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Konferenz

Konferenz2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Land/GebietUSA/Vereinigte Staaten
OrtSeattle
Zeitraum16/06/2422/06/24

Fingerprint

Untersuchen Sie die Forschungsthemen von „ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren