Monocular Total Capture: Posing Face, Body and Hands in the Wild
(Code and Dataset)

Donglai Xiang, Hanbyul Joo, Yaser Sheikh

Carnegie Mellon University

Abstract

We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image space. POFs are predicted by a Fully Convolutional Network (FCN), along with the joint confidence maps. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. We leverage a 3D deformable human model to reconstruct total body pose from the CNN outputs by exploiting the pose and shape prior in the model. We also present a texture-based tracking method to obtain temporally coherent motion capture output. We perform thorough quantitative evaluations including comparison with the existing body-specific and hand-specific methods, and performance analysis on camera viewpoint and human pose changes. Finally, we demonstrate the results of our total body motion capture on various challenging in-the-wild videos.

Publication

Monocular Total Capture: Posing Face, Body and Hands in the Wild
Donglai Xiang, Hanbyul Joo, Yaser Sheikh
Computer Vision and Pattern Recognition (CVPR) 2019 (Oral)
[ arXiv ]

Dataset

[ download link ] (File size: 270 GB)

Note: This dataset is a subset of our Panoptic Studio Dataset under the same license. This dataset is shared only for research purposes, and cannot be used for any commercial purposes.

By using this dataset, you agree to cite the following papers:

[1] Donglai Xiang, Hanbyul Joo, Yaser Sheikh. "Monocular Total Capture: Posing Face, Body and Hands in the Wild". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
[2] Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh. "Panoptic Studio: A Massively Multiview System for Social Interaction Capture". Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

Code

[ code ]

Monocular Total Capture: Posing Face, Body and Hands in the Wild (Code and Dataset)

Abstract

Publication

Dataset

Code

Monocular Total Capture: Posing Face, Body and Hands in the Wild
(Code and Dataset)