Compressing Video Calls using Synthetic Talking Heads

Madhav Agarwal, Anchit Gupta, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar

Research output: Contribution to conferencePaperpeer-review

1 Citation (SciVal)

Abstract

We leverage the modern advancements in talking head generation to propose an end-to-end system for talking head video compression. Our algorithm transmits pivot frames intermittently while the rest of the talking head video is generated by animating them. We use a state-of-the-art face reenactment network to detect key points in the non-pivot frames and transmit them to the receiver. A dense flow is then calculated to warp a pivot frame to reconstruct the non-pivot ones. Transmitting key points instead of full frames leads to significant compression. We propose a novel algorithm to adaptively select the best-suited pivot frames at regular intervals to provide a smooth experience. We also propose a frame-interpolater at the receiver's end to improve the compression levels further. Finally, a face enhancement network improves reconstruction quality, significantly improving several aspects like the sharpness of the generations. We evaluate our method both qualitatively and quantitatively on benchmark datasets and compare it with multiple compression techniques. We release a demo video and additional information at https://cvit.iiit.ac.in/research/projects/cvit-projects/talking-video-compression.

Original languageEnglish
Publication statusPublished - 24 Nov 2022
Event33rd British Machine Vision Conference Proceedings, BMVC 2022 - London, UK United Kingdom
Duration: 21 Nov 202224 Nov 2022

Conference

Conference33rd British Machine Vision Conference Proceedings, BMVC 2022
Country/TerritoryUK United Kingdom
CityLondon
Period21/11/2224/11/22

Bibliographical note

Publisher Copyright:
© 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.

Funding

In this work, we propose to use the high-level semantics of a talking head video to create extreme compression schemes which can revolutionize video calling. Our work uses compact key points to transmit information about the talking head in each video frame. We also propose a frame interpolation network followed by super-resolution to arbitrary resolutions. Finally, a pivot frame selection algorithm is used for long video calls helping our compression technique continue generating high-quality videos. In the future, we believe solving other aspects like ensuring its application on edge devices will be a prospective task. Acknowledgement: This work is partly supported by Huawei, India

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Compressing Video Calls using Synthetic Talking Heads'. Together they form a unique fingerprint.

Cite this