Unicolor: A unified framework for multi-modal colorization with transformer

Zhitong Huang, Nanxuan Zhao, Jing Liao

Research output: Contribution to journalArticlepeer-review

8 Citations (SciVal)


We propose the first unified framework UniColor to support colorization in multiple modalities, including both unconditional and conditional ones, such as stroke, exemplar, text, and even a mix of them. Rather than learning a separate model for each type of condition, we introduce a two-stage colorization framework for incorporating various conditions into a single model. In the first stage, multi-modal conditions are converted into a common representation of hint points. Particularly, we propose a novel CLIP-based method to convert the text to hint points. In the second stage, we propose a Transformer-based network composed of Chroma-VQGAN and Hybrid-Transformer to generate diverse and high-quality colorization results conditioned on hint points. Both qualitative and quantitative comparisons demonstrate that our method outperforms state-of-the-art methods in every control modality and further enables multi-modal colorization that was not feasible before. Moreover, we design an interactive interface showing the effectiveness of our unified framework in practical usage, including automatic colorization, hybrid-control colorization, local recolorization, and iterative color editing. Our code and models are available at https://luckyhzt.github.io/unicolor.

Original languageEnglish
Article number205
JournalACM Transactions on Graphics
Issue number6
Early online date30 Nov 2022
Publication statusPublished - 31 Dec 2022


  • color editing
  • colorization
  • multi-modal controls
  • transformer

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Unicolor: A unified framework for multi-modal colorization with transformer'. Together they form a unique fingerprint.

Cite this