Enhancing Person Synthesis in Complex Scenes via Intrinsic and Contextual Structure Modeling

Research output: Contribution to conferencePaperpeer-review

1 Downloads (Pure)


The Generative Adversarial Network (GAN) and its variations have enabled high quality image generation. However, generating reasonable persons in complex scenes (such as MS-COCO images) remains challenging. We propose a novel structure-based and context-aware approach to enhance the person synthesis in complex scenes. The method can success fully predict the person pose and face structures while respecting the weak layout-based context, then leverage the structures to refine the person appearance. Our method involves three parts. First, a memory-based model is used to encode person intrinsic structures including pose and face key points. Second, a context-aware model infers the conditional person structures from the layout context. Third, the structure-guided personappearancerefinersfurtherenhancethefinalimagegeneration.Ourexperiments present convincing person generation results in layout-to-image tasks on a challenging dataset. Person-related evaluations demonstrate our method achieves state-of-the-art performance, especially on person accuracy and face detection metrics.
Original languageEnglish
Publication statusAcceptance date - 1 Oct 2022
EventBritish Machine Vision Conference 2022 -
Duration: 21 Nov 202224 Nov 2022


ConferenceBritish Machine Vision Conference 2022


Dive into the research topics of 'Enhancing Person Synthesis in Complex Scenes via Intrinsic and Contextual Structure Modeling'. Together they form a unique fingerprint.

Cite this