General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, H.-P. Seidel, Christian Theobalt

Research output: Chapter in Book/Report/Conference proceedingChapter

20 Citations (Scopus)
85 Downloads (Pure)

Abstract

Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation - skeleton, volumetric shape, appearance, and optionally a body surface - and estimates the actor's motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume raycasting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, as well as variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.
Original languageEnglish
Title of host publicationComputer Vision - ECCV 2016
Subtitle of host publicationProceedings of the 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016
EditorsBastian Leibe, Jiri Matas, Nicu Sebe, Max Welling
Place of PublicationCham, Switzerland
PublisherSpringer
Pages509-526
Number of pages18
VolumePart V
ISBN (Electronic)978-3-319-46454-1
ISBN (Print)978-3-319-46453-4
DOIs
Publication statusPublished - 16 Sep 2016
EventEuropean Conference on Computer Vision 2016 - Amsterdam, Netherlands
Duration: 8 Oct 201616 Oct 2016
http://www.eccv2016.org/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume9099
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision 2016
Abbreviated titleECCV
CountryNetherlands
CityAmsterdam
Period8/10/1616/10/16
Internet address

Fingerprint

Animation
Visibility
Image processing
Cameras

Cite this

Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H-P., & Theobalt, C. (2016). General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision - ECCV 2016: Proceedings of the 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016 (Vol. Part V, pp. 509-526). (Lecture Notes in Computer Science; Vol. 9099). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-319-46454-1_31

General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues. / Rhodin, Helge; Robertini, Nadia; Casas, Dan; Richardt, Christian; Seidel, H.-P.; Theobalt, Christian.

Computer Vision - ECCV 2016: Proceedings of the 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016. ed. / Bastian Leibe; Jiri Matas; Nicu Sebe; Max Welling. Vol. Part V Cham, Switzerland : Springer, 2016. p. 509-526 (Lecture Notes in Computer Science; Vol. 9099).

Research output: Chapter in Book/Report/Conference proceedingChapter

Rhodin, H, Robertini, N, Casas, D, Richardt, C, Seidel, H-P & Theobalt, C 2016, General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues. in B Leibe, J Matas, N Sebe & M Welling (eds), Computer Vision - ECCV 2016: Proceedings of the 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016. vol. Part V, Lecture Notes in Computer Science, vol. 9099, Springer, Cham, Switzerland, pp. 509-526, European Conference on Computer Vision 2016, Amsterdam, Netherlands, 8/10/16. https://doi.org/10.1007/978-3-319-46454-1_31
Rhodin H, Robertini N, Casas D, Richardt C, Seidel H-P, Theobalt C. General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues. In Leibe B, Matas J, Sebe N, Welling M, editors, Computer Vision - ECCV 2016: Proceedings of the 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016. Vol. Part V. Cham, Switzerland: Springer. 2016. p. 509-526. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-46454-1_31
Rhodin, Helge ; Robertini, Nadia ; Casas, Dan ; Richardt, Christian ; Seidel, H.-P. ; Theobalt, Christian. / General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues. Computer Vision - ECCV 2016: Proceedings of the 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016. editor / Bastian Leibe ; Jiri Matas ; Nicu Sebe ; Max Welling. Vol. Part V Cham, Switzerland : Springer, 2016. pp. 509-526 (Lecture Notes in Computer Science).
@inbook{fa7c85bd636e4d0a9a1b1e6c74b8a58b,
title = "General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues",
abstract = "Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation - skeleton, volumetric shape, appearance, and optionally a body surface - and estimates the actor's motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume raycasting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, as well as variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.",
author = "Helge Rhodin and Nadia Robertini and Dan Casas and Christian Richardt and H.-P. Seidel and Christian Theobalt",
year = "2016",
month = "9",
day = "16",
doi = "10.1007/978-3-319-46454-1_31",
language = "English",
isbn = "978-3-319-46453-4",
volume = "Part V",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "509--526",
editor = "Bastian Leibe and Jiri Matas and Nicu Sebe and Max Welling",
booktitle = "Computer Vision - ECCV 2016",

}

TY - CHAP

T1 - General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

AU - Rhodin, Helge

AU - Robertini, Nadia

AU - Casas, Dan

AU - Richardt, Christian

AU - Seidel, H.-P.

AU - Theobalt, Christian

PY - 2016/9/16

Y1 - 2016/9/16

N2 - Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation - skeleton, volumetric shape, appearance, and optionally a body surface - and estimates the actor's motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume raycasting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, as well as variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.

AB - Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation - skeleton, volumetric shape, appearance, and optionally a body surface - and estimates the actor's motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume raycasting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, as well as variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.

UR - http://arxiv.org/abs/1607.08659

U2 - 10.1007/978-3-319-46454-1_31

DO - 10.1007/978-3-319-46454-1_31

M3 - Chapter

SN - 978-3-319-46453-4

VL - Part V

T3 - Lecture Notes in Computer Science

SP - 509

EP - 526

BT - Computer Vision - ECCV 2016

A2 - Leibe, Bastian

A2 - Matas, Jiri

A2 - Sebe, Nicu

A2 - Welling, Max

PB - Springer

CY - Cham, Switzerland

ER -