TY - GEN
T1 - Intelligent video editing
T2 - 12th Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 2021
AU - Gupta, Anchit
AU - Khan, Faizan Farooq
AU - Mukhopadhyay, Rudrabha
AU - Namboodiri, Vinay P.
AU - Jawahar, C. V.
PY - 2021/12/19
Y1 - 2021/12/19
N2 - This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experience of video editing without missing out on the benefits of modern synthetic video generation algorithms. This control enables us to lip-sync complex dubbed movie scenes, interviews, television shows, and other visual content. Furthermore, our editor provides features that automatically translate lectures from spoken content, lip-sync of the professor, and background content like slides. While doing so, we also tackle the critical aspect of synchronizing background content with the translated speech. We qualitatively evaluate the usefulness of the proposed editor by conducting human evaluations. Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality. We attach demo videos with the supplementary material clearly explaining the tool and also showcasing multiple results.
AB - This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experience of video editing without missing out on the benefits of modern synthetic video generation algorithms. This control enables us to lip-sync complex dubbed movie scenes, interviews, television shows, and other visual content. Furthermore, our editor provides features that automatically translate lectures from spoken content, lip-sync of the professor, and background content like slides. While doing so, we also tackle the critical aspect of synchronizing background content with the translated speech. We qualitatively evaluate the usefulness of the proposed editor by conducting human evaluations. Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality. We attach demo videos with the supplementary material clearly explaining the tool and also showcasing multiple results.
KW - Human in the loop
KW - Lip-sync
KW - Speech-to-speech translation
KW - Talking head generation
KW - Video editing
UR - http://www.scopus.com/inward/record.url?scp=85122022200&partnerID=8YFLogxK
U2 - 10.1145/3490035.3490284
DO - 10.1145/3490035.3490284
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85122022200
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 9
BT - Proceedings of ICVGIP 2021 - 12th Indian Conference on Computer Vision, Graphics and Image Processing
PB - Association for Computing Machinery
CY - U. S. A.
Y2 - 20 December 2021 through 22 December 2021
ER -