TY - GEN
T1 - Jointly trained image and video generation using residual vectors
AU - Dandi, Yatin
AU - Das, Aniket
AU - Singhal, Soumye
AU - Namboodiri, Vinay P.
AU - Rai, Piyush
PY - 2020/5/14
Y1 - 2020/5/14
N2 - In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video. We utilize the technique to jointly train an image generation model with a fixed prior along with a video generation model lacking constraints such as disentanglement. The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames. Moreover, experimental results verify our approach's compatibility with pre-training on videos or images and training on datasets containing a mixture of both. A comprehensive set of quantitative and qualitative evaluations reveal the improvements in sample quality and diversity over both video generation and image generation baselines. We further demonstrate the technique's capabilities of exploiting similarity in features across frames by applying it to a model based on decomposing the video into motion and content. The proposed model allows minor variations in content across frames while maintaining the temporal dependence through latent vectors encoding the pose or motion features.
AB - In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video. We utilize the technique to jointly train an image generation model with a fixed prior along with a video generation model lacking constraints such as disentanglement. The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames. Moreover, experimental results verify our approach's compatibility with pre-training on videos or images and training on datasets containing a mixture of both. A comprehensive set of quantitative and qualitative evaluations reveal the improvements in sample quality and diversity over both video generation and image generation baselines. We further demonstrate the technique's capabilities of exploiting similarity in features across frames by applying it to a model based on decomposing the video into motion and content. The proposed model allows minor variations in content across frames while maintaining the temporal dependence through latent vectors encoding the pose or motion features.
UR - http://www.scopus.com/inward/record.url?scp=85085521813&partnerID=8YFLogxK
U2 - 10.1109/WACV45572.2020.9093308
DO - 10.1109/WACV45572.2020.9093308
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85085521813
T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
SP - 3017
EP - 3031
BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
PB - IEEE
T2 - 2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020
Y2 - 1 March 2020 through 5 March 2020
ER -