PanopticStudio Toolbox
- Download the PanopticStudio Toolbox on GitHub (Matlab and Python usage examples included).
- With the PanopticStudio Toolbox, you can
- Download the data as compressed video files
- Extract images from downloaded videos
- Load camera calibration parameters
- Load 3D pose reconstruction results
- Project 3D pose to 2D camera views
Downloading the Data
Camera Naming Rule
- Camera names are given by {sensorIdx}_{nodeIdx}
- The {sensorIdx} is represented as a two digit number, and can be one of the following:
- 00: for HD cameras
- 01-20: for VGA cameras, where the number denotes a VGA module (panel) index
- 50: for Kinect RGB cameras
- The nodeIdx represents a camera index within each sensor type (or each module in VGAs).
- Each VGA module has 24 cameras, so nodeIdx in each VGA module ranges from 1 to 24.
- In summary,
- HD (31 cameras): 00_00 ~ 00_30
- VGA (24 cameras/module): 01_01 ~ 01_24 through 20_01 ~ 20_24
- Kinect (10 cameras): 50_01 ~ 50_10
- HD nodeIdx is zero-based, while the nodeIdx of VGA and Kinects are one-based.
- Note that the order of the camera indices has nothing to do with their locations. VGA module 1 and VGA module 2 may not be neighboring panels.
Calibration Data
- Calibration parameters for all cameras (VGAs, HDs, and Kinects) are provided as a JSON file.
- Each camera is an element in the "cameras" array, with the following information:
"cameras": [
{
"name": "01_01",
"type": "vga",
"resolution": [640,480],
"panel": 1,
"node": 1,
"K": [
[745.716,0,374.297],
[0,746.048,226.517],
[0,0,1]
],
"distCoef": [-0.318745,0.0454429,-0.000811973,0.000847189,0.0799718],
"R": [
[0.969466296,0.02846943647,-0.2435664017],
[-0.04833552526,0.9959371934,-0.07597883721],
[0.2404137638,0.08543183185,0.9669036272]
],
"t": [
[-51.22735213],
[142.8763812],
[289.9330519]
]
},
...
- The camera names follow the naming rule described above.
-
K,R,t
are the camera intrinsics, rotation matrix, and translation respectively.
- If
X
is a 3x1 vector, then the camera transform is x = K*(R*X + t)
(with projection and lens distortion).
-
distCoef
represents lens distortion parameters, [k1,k2,p1,p2,k3]
, as in the OpenCV calibration format.
- 1 unit length in the world coordinate represents 1 cm.
Video Data
Skeleton Reconstruction Results
- We reconstruct 3D motion of people using the method of [Joo et al. 2016] (under submission), which is an extension of [Joo et al. 2015].
- The reconstruction results are generated by using the 480 VGA camera views.
- The outputs are saved as JSON files. Each file contains 3D skeletons at a single time instance. A skeleton is composed of 15 joints.
- An array "bodies" holds each skeleton, where each element is
"bodies" : [
{
"id": 1,
"joints15": [82.8466, -144.961, 23.0948, 0.495789, 77.4016, -169.599, 18.2888, 0.477661, ...]
},
...
id
: a unique subject index within a sequence. Skeletons with the same id across time represent temporally associated moving skeletons (an individual).
joints15
: fifteen 3D joint locations, formatted as [x1,y1,z1,c1,x2,y2,z2,c2,...]
where each c
is a per-joint confidence score.
The order of joints is as follows (see this example for an illustration):
Neck, HeadTop, BodyCenter, lShoulder,lElbow, lWrist, lHip, lKnee, lAnkle, rShoulder, rElbow, rWrist, rHip, rKnee, rAnkle