Abstract
We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.
Original language | English |
---|---|
Article number | 235 |
Pages (from-to) | 1-12 |
Journal | ACM Transactions on Graphics |
Volume | 43 |
Issue number | 6 |
Early online date | 19 Nov 2024 |
DOIs | |
Publication status | Published - 31 Dec 2024 |
Acknowledgements
The authors would like to thank Rodney Brunet, Kendall Robertson and Jon Hanzelka for their work on the clothing asset library; Steve Hoogendyk for his work on the tongue blend shapes; and Ben Lundell and Erroll Wood for their comments and suggestions.Keywords
- 3D reconstruction
- body pose
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design