Deep Video Portraits

Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Perez, Christian Richardt, Michael Zollhöfer, Christian Theobalt

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network – thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.
LanguageEnglish
Article number163
Pages1-14
Number of pages14
JournalACM Transactions on Graphics
Volume37
Issue number4
DOIs
StatusPublished - 1 Aug 2018
EventSIGGRAPH 2018 - Vancouver, Canada
Duration: 12 Aug 201816 Aug 2018
https://s2018.siggraph.org/

Fingerprint

Animation
Neural networks
Experiments

Cite this

Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Nießner, M., ... Theobalt, C. (2018). Deep Video Portraits. ACM Transactions on Graphics, 37(4), 1-14. [163]. https://doi.org/10.1145/3197517.3201283

Deep Video Portraits. / Kim, Hyeongwoo; Garrido, Pablo; Tewari, Ayush; Xu, Weipeng; Thies, Justus; Nießner, Matthias; Perez, Patrick; Richardt, Christian; Zollhöfer, Michael; Theobalt, Christian.

In: ACM Transactions on Graphics, Vol. 37, No. 4, 163, 01.08.2018, p. 1-14.

Research output: Contribution to journalArticle

Kim, H, Garrido, P, Tewari, A, Xu, W, Thies, J, Nießner, M, Perez, P, Richardt, C, Zollhöfer, M & Theobalt, C 2018, 'Deep Video Portraits', ACM Transactions on Graphics, vol. 37, no. 4, 163, pp. 1-14. https://doi.org/10.1145/3197517.3201283
Kim H, Garrido P, Tewari A, Xu W, Thies J, Nießner M et al. Deep Video Portraits. ACM Transactions on Graphics. 2018 Aug 1;37(4):1-14. 163. https://doi.org/10.1145/3197517.3201283
Kim, Hyeongwoo ; Garrido, Pablo ; Tewari, Ayush ; Xu, Weipeng ; Thies, Justus ; Nießner, Matthias ; Perez, Patrick ; Richardt, Christian ; Zollhöfer, Michael ; Theobalt, Christian. / Deep Video Portraits. In: ACM Transactions on Graphics. 2018 ; Vol. 37, No. 4. pp. 1-14.
@article{6e0e2e98d2ae410c9e708b0e3524026e,
title = "Deep Video Portraits",
abstract = "We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network – thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.",
author = "Hyeongwoo Kim and Pablo Garrido and Ayush Tewari and Weipeng Xu and Justus Thies and Matthias Nie{\ss}ner and Patrick Perez and Christian Richardt and Michael Zollh{\"o}fer and Christian Theobalt",
year = "2018",
month = "8",
day = "1",
doi = "10.1145/3197517.3201283",
language = "English",
volume = "37",
pages = "1--14",
journal = "ACM Transactions on Graphics",
issn = "0730-0301",
publisher = "Association for Computing Machinery",
number = "4",

}

TY - JOUR

T1 - Deep Video Portraits

AU - Kim, Hyeongwoo

AU - Garrido, Pablo

AU - Tewari, Ayush

AU - Xu, Weipeng

AU - Thies, Justus

AU - Nießner, Matthias

AU - Perez, Patrick

AU - Richardt, Christian

AU - Zollhöfer, Michael

AU - Theobalt, Christian

PY - 2018/8/1

Y1 - 2018/8/1

N2 - We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network – thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.

AB - We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network – thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.

UR - http://richardt.name/publications/deep-video-portraits/

U2 - 10.1145/3197517.3201283

DO - 10.1145/3197517.3201283

M3 - Article

VL - 37

SP - 1

EP - 14

JO - ACM Transactions on Graphics

T2 - ACM Transactions on Graphics

JF - ACM Transactions on Graphics

SN - 0730-0301

IS - 4

M1 - 163

ER -