EgoCap: egocentric marker-less motion capture with two fisheye cameras

Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutinov, Mohammad Shafiei, H.-P. Seidel, Bernt Schiele, Christian Theobalt

Research output: Contribution to journalArticle

Abstract

Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The setup time and effort is minimal and the captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.

Fingerprint

Virtual reality
Cameras
Units of measurement
Detectors

Keywords

  • motion capture
  • first-person vision
  • markerless
  • optical
  • inside-in
  • crowded scenes
  • large-scale

Cite this

Rhodin, H., Richardt, C., Casas, D., Insafutinov, E., Shafiei, M., Seidel, H-P., ... Theobalt, C. (2016). EgoCap: egocentric marker-less motion capture with two fisheye cameras. ACM Transactions on Graphics, 35(6), [162]. https://doi.org/10.1145/2980179.2980235

EgoCap : egocentric marker-less motion capture with two fisheye cameras. / Rhodin, Helge; Richardt, Christian; Casas, Dan; Insafutinov, Eldar; Shafiei, Mohammad; Seidel, H.-P.; Schiele, Bernt; Theobalt, Christian.

In: ACM Transactions on Graphics, Vol. 35, No. 6, 162, 11.2016.

Research output: Contribution to journalArticle

Rhodin, H, Richardt, C, Casas, D, Insafutinov, E, Shafiei, M, Seidel, H-P, Schiele, B & Theobalt, C 2016, 'EgoCap: egocentric marker-less motion capture with two fisheye cameras' ACM Transactions on Graphics, vol. 35, no. 6, 162. https://doi.org/10.1145/2980179.2980235
Rhodin, Helge ; Richardt, Christian ; Casas, Dan ; Insafutinov, Eldar ; Shafiei, Mohammad ; Seidel, H.-P. ; Schiele, Bernt ; Theobalt, Christian. / EgoCap : egocentric marker-less motion capture with two fisheye cameras. In: ACM Transactions on Graphics. 2016 ; Vol. 35, No. 6.
@article{91fb6996bb234330a8c9c336a0b2935c,
title = "EgoCap: egocentric marker-less motion capture with two fisheye cameras",
abstract = "Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The setup time and effort is minimal and the captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.",
keywords = "motion capture, first-person vision, markerless, optical, inside-in, crowded scenes, large-scale",
author = "Helge Rhodin and Christian Richardt and Dan Casas and Eldar Insafutinov and Mohammad Shafiei and H.-P. Seidel and Bernt Schiele and Christian Theobalt",
year = "2016",
month = "11",
doi = "10.1145/2980179.2980235",
language = "English",
volume = "35",
journal = "ACM Transactions on Graphics",
issn = "0730-0301",
publisher = "Association for Computing Machinery",
number = "6",

}

TY - JOUR

T1 - EgoCap

T2 - ACM Transactions on Graphics

AU - Rhodin, Helge

AU - Richardt, Christian

AU - Casas, Dan

AU - Insafutinov, Eldar

AU - Shafiei, Mohammad

AU - Seidel, H.-P.

AU - Schiele, Bernt

AU - Theobalt, Christian

PY - 2016/11

Y1 - 2016/11

N2 - Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The setup time and effort is minimal and the captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.

AB - Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The setup time and effort is minimal and the captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.

KW - motion capture

KW - first-person vision

KW - markerless

KW - optical

KW - inside-in

KW - crowded scenes

KW - large-scale

UR - https://doi.org/10.1145/2980179.2980235

UR - http://gvv.mpi-inf.mpg.de/projects/EgoCap/

U2 - 10.1145/2980179.2980235

DO - 10.1145/2980179.2980235

M3 - Article

VL - 35

JO - ACM Transactions on Graphics

JF - ACM Transactions on Graphics

SN - 0730-0301

IS - 6

M1 - 162

ER -