TY - JOUR
T1 - Denoising Reuse
T2 - Exploiting Inter-frame Motion Consistency for Efficient Video Generation
AU - Wang, Chenyu
AU - Yan, Shuo
AU - Chen, Yixuan
AU - Wang, Xianwei
AU - Wang, Yujiang
AU - Dong, Mingzhi
AU - Yang, Xiaochen
AU - Li, Dongsheng
AU - Zhu, Rui
AU - Clifton, David A.
AU - Dick, Robert P.
AU - Lv, Qin
AU - Yang, Fan
AU - Lu, Tun
AU - Gu, Ning
AU - Shang, Li
PY - 2025/3/6
Y1 - 2025/3/6
N2 - Denoising-based diffusion models have attained impressive image synthesis; however, their applications on videos can lead to unaffordable computational costs due to the per-frame denoising operations. In pursuit of efficient video generation, we present a Diffusion Reuse MOtion (Dr. Mo) network to accelerate the video-based denoising process. Our crucial observation is that the latent representations in early denoising steps between adjacent video frames exhibit high consistencies with motion clues. Inspired by the discovery, we propose to accelerate the video denoising process by incorporating lightweight, learnable motion features. Specifically, Dr. Mo will only compute all denoising steps for base frames. For a non-based frame, Dr. Mo will propagate the pre-computed based latents of a particular step with interframe motions to obtain a fast estimation of its coarse-grained latent representation, from which the denoising will continue to obtain more sensitive and fine-grained representations. On top of this, Dr. Mo employs a meta-network named Denoising Step Selector (DSS) to dynamically determine the step to perform motion-based propagations for each frame, ensuring the correct transformation of multi-granularity visual features. Extensive evaluations on video generation and editing tasks indicate that Dr. Mo delivers widely applicable acceleration for diffusion-based video generations while effectively retaining the visual quality and style. Video generation and visualization results can be found at https://drmo-denoising-reuse.github.io.
AB - Denoising-based diffusion models have attained impressive image synthesis; however, their applications on videos can lead to unaffordable computational costs due to the per-frame denoising operations. In pursuit of efficient video generation, we present a Diffusion Reuse MOtion (Dr. Mo) network to accelerate the video-based denoising process. Our crucial observation is that the latent representations in early denoising steps between adjacent video frames exhibit high consistencies with motion clues. Inspired by the discovery, we propose to accelerate the video denoising process by incorporating lightweight, learnable motion features. Specifically, Dr. Mo will only compute all denoising steps for base frames. For a non-based frame, Dr. Mo will propagate the pre-computed based latents of a particular step with interframe motions to obtain a fast estimation of its coarse-grained latent representation, from which the denoising will continue to obtain more sensitive and fine-grained representations. On top of this, Dr. Mo employs a meta-network named Denoising Step Selector (DSS) to dynamically determine the step to perform motion-based propagations for each frame, ensuring the correct transformation of multi-granularity visual features. Extensive evaluations on video generation and editing tasks indicate that Dr. Mo delivers widely applicable acceleration for diffusion-based video generations while effectively retaining the visual quality and style. Video generation and visualization results can be found at https://drmo-denoising-reuse.github.io.
KW - Computational Efficiency
KW - Diffusion Models
KW - Video Generation
UR - http://www.scopus.com/inward/record.url?scp=86000736316&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2025.3548728
DO - 10.1109/TCSVT.2025.3548728
M3 - Article
AN - SCOPUS:86000736316
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -