TY - GEN
T1 - Bridged variational autoencoders for joint modeling of images and attributes
AU - Yadav, Ravindra
AU - Sardana, Ashish
AU - Namboodiri, Vinay P.
AU - Hegde, Rajesh M.
PY - 2020/5/14
Y1 - 2020/5/14
N2 - Generative models have recently shown the ability to realistically generate data and model the distribution accurately. However, joint modeling of an image with the attribute that it is labeled with requires learning a cross modal correspondence between image and attribute data. Though the information present in a set of images and its attributes possesses completely different statistical properties altogether, there exists an inherent correspondence that is challenging to capture. Various models have aimed at capturing this correspondence either through joint modeling of a variational autoencoder or through separate encoder networks that are then concatenated. We present an alternative by proposing a bridged variational autoencoder that allows for learning cross-modal correspondence by incorporating cross-modal hallucination losses in the latent space. In comparison to the existing methods, we have found that by using a bridge connection in latent space we not only obtain better generation results, but also obtain highly parameter-efficient model which provide 40% reduction in training parameters for bimodal dataset and nearly 70% reduction for trimodal dataset. We validate the proposed method through comparison with state of the art methods and benchmarking on standard datasets.
AB - Generative models have recently shown the ability to realistically generate data and model the distribution accurately. However, joint modeling of an image with the attribute that it is labeled with requires learning a cross modal correspondence between image and attribute data. Though the information present in a set of images and its attributes possesses completely different statistical properties altogether, there exists an inherent correspondence that is challenging to capture. Various models have aimed at capturing this correspondence either through joint modeling of a variational autoencoder or through separate encoder networks that are then concatenated. We present an alternative by proposing a bridged variational autoencoder that allows for learning cross-modal correspondence by incorporating cross-modal hallucination losses in the latent space. In comparison to the existing methods, we have found that by using a bridge connection in latent space we not only obtain better generation results, but also obtain highly parameter-efficient model which provide 40% reduction in training parameters for bimodal dataset and nearly 70% reduction for trimodal dataset. We validate the proposed method through comparison with state of the art methods and benchmarking on standard datasets.
UR - http://www.scopus.com/inward/record.url?scp=85085505375&partnerID=8YFLogxK
U2 - 10.1109/WACV45572.2020.9093565
DO - 10.1109/WACV45572.2020.9093565
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85085505375
T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
SP - 1468
EP - 1476
BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
PB - IEEE
T2 - 2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020
Y2 - 1 March 2020 through 5 March 2020
ER -