InverseFaceNet: Deep Monocular Inverse Face Rendering

Hyeongwoo Kim, Michael Zollhöfer, Ayush Tewari, Justus Thies, Christian Richardt, Christian Theobalt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. By estimating all parameters from just a single image, advanced editing possibilities on a single face image, such as appearance editing and relighting, become feasible in real time. Most previous learning-based face reconstruction approaches do not jointly recover all dimensions, or are severely limited in terms of visual quality. In contrast, we propose to recover high-quality facial pose, shape, expression, reflectance and illumination using a deep neural network that is trained using a large, synthetically created training corpus. Our approach builds on a novel loss function that measures model-space similarity directly in parameter space and significantly improves reconstruction accuracy. We further propose a self-supervised bootstrapping process in the network training loop, which iteratively updates the synthetic training corpus to better reflect the distribution of real-world imagery. We demonstrate that this strategy outperforms completely synthetically trained networks. Finally, we show high-quality reconstructions and compare our approach to several state-of-the-art approaches.
LanguageEnglish
Title of host publication2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublisherIEEE
Pages4625-4634
Number of pages10
ISBN (Electronic)978-1-5386-6421-6
DOIs
StatusPublished - 18 Jun 2018
EventInternational Conference on Computer Vision and Pattern Recognition - Salt Lake City, USA United States
Duration: 18 Jun 201822 Jun 2018
http://cvpr2018.thecvf.com/

Publication series

Name2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublisherIEEE
ISSN (Print)2575-7075

Conference

ConferenceInternational Conference on Computer Vision and Pattern Recognition
Abbreviated titleCVPR
CountryUSA United States
CitySalt Lake City
Period18/06/1822/06/18
Internet address

Cite this

Kim, H., Zollhöfer, M., Tewari, A., Thies, J., Richardt, C., & Theobalt, C. (2018). InverseFaceNet: Deep Monocular Inverse Face Rendering. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4625-4634). (2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition). IEEE. https://doi.org/10.1109/CVPR.2018.00486

InverseFaceNet: Deep Monocular Inverse Face Rendering. / Kim, Hyeongwoo; Zollhöfer, Michael; Tewari, Ayush; Thies, Justus; Richardt, Christian; Theobalt, Christian.

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018. p. 4625-4634 (2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, H, Zollhöfer, M, Tewari, A, Thies, J, Richardt, C & Theobalt, C 2018, InverseFaceNet: Deep Monocular Inverse Face Rendering. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, pp. 4625-4634, International Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA United States, 18/06/18. https://doi.org/10.1109/CVPR.2018.00486
Kim H, Zollhöfer M, Tewari A, Thies J, Richardt C, Theobalt C. InverseFaceNet: Deep Monocular Inverse Face Rendering. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. 2018. p. 4625-4634. (2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition). https://doi.org/10.1109/CVPR.2018.00486
Kim, Hyeongwoo ; Zollhöfer, Michael ; Tewari, Ayush ; Thies, Justus ; Richardt, Christian ; Theobalt, Christian. / InverseFaceNet: Deep Monocular Inverse Face Rendering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018. pp. 4625-4634 (2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition).
@inproceedings{dcb52b58c66949fd8e57543acb812cdc,
title = "InverseFaceNet: Deep Monocular Inverse Face Rendering",
abstract = "We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. By estimating all parameters from just a single image, advanced editing possibilities on a single face image, such as appearance editing and relighting, become feasible in real time. Most previous learning-based face reconstruction approaches do not jointly recover all dimensions, or are severely limited in terms of visual quality. In contrast, we propose to recover high-quality facial pose, shape, expression, reflectance and illumination using a deep neural network that is trained using a large, synthetically created training corpus. Our approach builds on a novel loss function that measures model-space similarity directly in parameter space and significantly improves reconstruction accuracy. We further propose a self-supervised bootstrapping process in the network training loop, which iteratively updates the synthetic training corpus to better reflect the distribution of real-world imagery. We demonstrate that this strategy outperforms completely synthetically trained networks. Finally, we show high-quality reconstructions and compare our approach to several state-of-the-art approaches.",
author = "Hyeongwoo Kim and Michael Zollh{\"o}fer and Ayush Tewari and Justus Thies and Christian Richardt and Christian Theobalt",
year = "2018",
month = "6",
day = "18",
doi = "10.1109/CVPR.2018.00486",
language = "English",
series = "2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition",
publisher = "IEEE",
pages = "4625--4634",
booktitle = "2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition",
address = "USA United States",

}

TY - GEN

T1 - InverseFaceNet: Deep Monocular Inverse Face Rendering

AU - Kim, Hyeongwoo

AU - Zollhöfer, Michael

AU - Tewari, Ayush

AU - Thies, Justus

AU - Richardt, Christian

AU - Theobalt, Christian

PY - 2018/6/18

Y1 - 2018/6/18

N2 - We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. By estimating all parameters from just a single image, advanced editing possibilities on a single face image, such as appearance editing and relighting, become feasible in real time. Most previous learning-based face reconstruction approaches do not jointly recover all dimensions, or are severely limited in terms of visual quality. In contrast, we propose to recover high-quality facial pose, shape, expression, reflectance and illumination using a deep neural network that is trained using a large, synthetically created training corpus. Our approach builds on a novel loss function that measures model-space similarity directly in parameter space and significantly improves reconstruction accuracy. We further propose a self-supervised bootstrapping process in the network training loop, which iteratively updates the synthetic training corpus to better reflect the distribution of real-world imagery. We demonstrate that this strategy outperforms completely synthetically trained networks. Finally, we show high-quality reconstructions and compare our approach to several state-of-the-art approaches.

AB - We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. By estimating all parameters from just a single image, advanced editing possibilities on a single face image, such as appearance editing and relighting, become feasible in real time. Most previous learning-based face reconstruction approaches do not jointly recover all dimensions, or are severely limited in terms of visual quality. In contrast, we propose to recover high-quality facial pose, shape, expression, reflectance and illumination using a deep neural network that is trained using a large, synthetically created training corpus. Our approach builds on a novel loss function that measures model-space similarity directly in parameter space and significantly improves reconstruction accuracy. We further propose a self-supervised bootstrapping process in the network training loop, which iteratively updates the synthetic training corpus to better reflect the distribution of real-world imagery. We demonstrate that this strategy outperforms completely synthetically trained networks. Finally, we show high-quality reconstructions and compare our approach to several state-of-the-art approaches.

UR - http://richardt.name/publications/inversefacenet/

U2 - 10.1109/CVPR.2018.00486

DO - 10.1109/CVPR.2018.00486

M3 - Conference contribution

T3 - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

SP - 4625

EP - 4634

BT - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

PB - IEEE

ER -