Enhancing Person Synthesis in Complex Scenes via Intrinsic and Contextual Structure Modeling

Research output: Contribution to conferencePaperpeer-review

27 Downloads (Pure)


The Generative Adversarial Network (GAN) and its variations have enabled high quality image generation. However, generating reasonable persons in complex scenes (such as MS-COCO images) remains challenging. We propose a novel structure-based and context-aware approach to enhance the person synthesis in complex scenes. The method can success fully predict the person pose and face structures while respecting the weak layout-based context, then leverage the structures to refine the person appearance. Our method involves three parts. First, a memory-based model is used to encode person intrinsic structures including pose and face key points. Second, a context-aware model infers the conditional person structures from the layout context. Third, the structure-guided personappearancerefinersfurtherenhancethefinalimagegeneration.Ourexperiments present convincing person generation results in layout-to-image tasks on a challenging dataset. Person-related evaluations demonstrate our method achieves state-of-the-art performance, especially on person accuracy and face detection metrics.
Original languageEnglish
Publication statusAcceptance date - 1 Oct 2022
EventBritish Machine Vision Conference 2022 -
Duration: 21 Nov 202224 Nov 2022


ConferenceBritish Machine Vision Conference 2022

Bibliographical note

This work is supported by RCUK grant CAMERA (EP/M023281/1, EP/T022523/1), Centre for Augmented Reasoning (CAR) at the Australian Institute for Machine Learning, and a gift from Adobe


Dive into the research topics of 'Enhancing Person Synthesis in Complex Scenes via Intrinsic and Contextual Structure Modeling'. Together they form a unique fingerprint.

Cite this