IntrinsicDiffusion: Joint Intrinsic Layers from Latent Diffusion Models

Jundan Luo, Duygu Ceylan, Jae Shin Yoon, Nanxuan Zhao, Julien Philip, Anna Frühstück, Wenbin Li, Christian Richardt, Tuanfeng Wang

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

Reasoning about the intrinsic properties of an image, such as albedo, illumination, and surface geometry, is a long-standing problem with many applications in image editing and compositing. Existing solutions to this ill-posed problem either heavily rely on manually designed priors or learn priors from limited datasets that lack diversity. Hence, they fall short in generalizing to in-the-wild test scenarios. In this paper, we show that a large-scale text-to-image generation model trained on a massive amount of visual data can implicitly learn intrinsic image priors. In particular, we introduce a novel conditioning mechanism built on top of a pre-trained foundational image generation model to jointly predict multiple intrinsic modalities from an input image. We demonstrate that predicting different modalities in a collaborative manner improves the overall quality. This design also enables mixing datasets with annotations of only a subset of the modalities during training, contributing to the generalizability of our approach. Our method achieves state-of-the-art performance in intrinsic image decomposition, both qualitatively and quantitatively. We also demonstrate downstream image editing applications, such as relighting and retexturing.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH 2024 Conference Papers
EditorsStephen N. Spencer
Place of PublicationNew York, U. S. A.
PublisherAssociation for Computing Machinery
Pages1-11
Number of pages11
ISBN (Electronic)9798400705250
DOIs
Publication statusPublished - 13 Jul 2024
EventSIGGRAPH 2024 Conference Papers - Denver, USA United States
Duration: 28 Jul 20241 Aug 2024

Publication series

NameProceedings - SIGGRAPH 2024 Conference Papers

Conference

ConferenceSIGGRAPH 2024 Conference Papers
Country/TerritoryUSA United States
CityDenver
Period28/07/241/08/24

Funding

We would like to thank Li and Snavely [2018a], Li et al. [2020], Zhu et al. [2022], Das et al. [2022], and Careaga and Aksoy [2023] for publishing their source code. This work was supported by EPSRC CAMERA 2.0 (EP/T022523/1) and UKRI MyWorld Strength in Places Programme (SIPF00006/1).

FundersFunder number
Careaga and Aksoy
EPSRC Centre for Doctoral Training in Cyber SecurityEP/T022523/1
UKRI CDT in Accountable, Responsible and Transparent AISIPF00006/1

    Keywords

    • diffusion model
    • intrinsic image decomposition
    • multi-task learning
    • surface normal estimation

    ASJC Scopus subject areas

    • Computer Vision and Pattern Recognition
    • Visual Arts and Performing Arts
    • Computer Graphics and Computer-Aided Design

    Fingerprint

    Dive into the research topics of 'IntrinsicDiffusion: Joint Intrinsic Layers from Latent Diffusion Models'. Together they form a unique fingerprint.

    Cite this