TY - GEN
T1 - Urban-StyleGAN
T2 - 34th IEEE Intelligent Vehicles Symposium, IV 2023
AU - Eskandar, George
AU - Farag, Youssef
AU - Yenamandra, Tarun
AU - Cremers, Daniel
AU - Guirguis, Karim
AU - Yang, Bin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - A promise of Generative Adversarial Networks (GANs) is to provide cheap photorealistic data for training and validating AI models in autonomous driving. Despite their huge success, their performance on complex images featuring multiple objects is understudied. While some frameworks produce high-quality street scenes with little to no control over the image content, others offer more control at the expense of high-quality generation. A common limitation of both approaches is the use of global latent codes for the whole image, which hinders the learning of independent object distributions. Motivated by SemanticStyleGAN (SSG), a recent work on latent space disentanglement in human face generation, we propose a novel framework, Urban-StyleGAN, for urban scene generation and manipulation. We find that a straightforward application of SSG leads to poor results because urban scenes are more complex than human faces. To provide a more compact yet disentangled latent representation, we develop a class grouping strategy wherein individual classes are grouped into super-classes. Moreover, we employ an unsupervised latent exploration algorithm in the S-space of the generator and show that it is more efficient than the conventional W+ -space in controlling the image content. Results on the Cityscapes and Mapillary datasets show the proposed approach achieves significantly more controllability and improved image quality than previous approaches on urban scenes and is on par with general-purpose non-controllable generative models (like StyleGAN2) in terms of quality.
AB - A promise of Generative Adversarial Networks (GANs) is to provide cheap photorealistic data for training and validating AI models in autonomous driving. Despite their huge success, their performance on complex images featuring multiple objects is understudied. While some frameworks produce high-quality street scenes with little to no control over the image content, others offer more control at the expense of high-quality generation. A common limitation of both approaches is the use of global latent codes for the whole image, which hinders the learning of independent object distributions. Motivated by SemanticStyleGAN (SSG), a recent work on latent space disentanglement in human face generation, we propose a novel framework, Urban-StyleGAN, for urban scene generation and manipulation. We find that a straightforward application of SSG leads to poor results because urban scenes are more complex than human faces. To provide a more compact yet disentangled latent representation, we develop a class grouping strategy wherein individual classes are grouped into super-classes. Moreover, we employ an unsupervised latent exploration algorithm in the S-space of the generator and show that it is more efficient than the conventional W+ -space in controlling the image content. Results on the Cityscapes and Mapillary datasets show the proposed approach achieves significantly more controllability and improved image quality than previous approaches on urban scenes and is on par with general-purpose non-controllable generative models (like StyleGAN2) in terms of quality.
KW - Disentanglement
KW - GANs
KW - Image Manipulation and Editing
KW - Latent Space Exploration
KW - Semantic Prior
UR - http://www.scopus.com/inward/record.url?scp=85164741836&partnerID=8YFLogxK
U2 - 10.1109/IV55152.2023.10186698
DO - 10.1109/IV55152.2023.10186698
M3 - Conference contribution
AN - SCOPUS:85164741836
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
BT - IV 2023 - IEEE Intelligent Vehicles Symposium, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 June 2023 through 7 June 2023
ER -