SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis

Azade Farshad, Yousef Yeganeh, Yu Chi, Chengzhi Shen, Björn Ommer, Nassir Navab

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging.To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation. Project Page: scenegenie.github.io/SceneGenie/

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages88-98
Number of pages11
ISBN (Electronic)9798350307443
DOIs
StatePublished - 2023
Event2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 - Paris, France
Duration: 2 Oct 20236 Oct 2023

Publication series

NameProceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

Conference

Conference2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Country/TerritoryFrance
CityParis
Period2/10/236/10/23

Keywords

  • Diffusion Models
  • Guidance
  • Image Generation
  • Scene Graphs

Fingerprint

Dive into the research topics of 'SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis'. Together they form a unique fingerprint.

Cite this