Timo v. Marcard, Gerard Pons-Moll, and Bodo Rosenhahn
Video-based human motion capture has been a very active research area for decades now. The articulated structure of the human body, occlusions, partial observations and image ambiguities makes it very hard to accurately track the high number of degrees of freedom of the human pose. Recent approaches have shown, that adding sparse orientation cues from Inertial Measurement Units (IMUs) helps to disambiguate and improves full-body human motion capture. As a complementary data source, inertial sensors allow for accurate estimation of limb orientations even under fast motions.
In the research landscape of marker-less motion capture, publicly available benchmarks for video-based trackers (e.g. HumanEva, Human3.6M) generally lack inertial data. One exception is the MPI08 dataset, which provides inertial data of 5 IMUs along with video data.
This new dataset, called TNT15, consists of synchronized data streams from 8 RGB-cameras and 10 IMUs. In contrast to MPI08 it has been recorded in a normal office room environment and the high number of 10 IMUs can be used for new tracking approaches or improved evaluation purposes.
3D scans |
8 RGB-cameras |
10 IMUs |
![]() |
![]() |
![]() |
In order to download the dataset and obtain more information please visit our project page at TNT15 dataset.
Gerard Pons-Moll, Andreas Baak, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bodo Rosenhahn
In this work, we present an approach to fuse video with orientation data obtained from extended inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for drift- free estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.
3D scans |
Multiview sequences |
5 sensors |
![]() |
![]() |
![]() |
In order to download the dataset and obtain more information please visit our project page at MPI08 dataset.
Thomas Brox, Bodo Rosenhahn, Juergen Gall, and Daniel Cremers
N. Hasler, B. Rosenhahn, T. Thormählen, M. Wand, J. Gall, and H.-P. Seidel
Due to several requests and interest of other research groups on our image sequences we use for model driven tracking, we decided to make them freely available for your own tests and experiments. The sequences (usually) contain the images, projection matrix(es), 3D model and pose init (as 4x4 matrix and a list of joint angles (optional)).
![]() |
![]() |
![]() |
In order to download the dataset and obtain more information please visit our project page at TPAMI09 benchmark.
MOOF (MOvements Of the Feet) is a video dataset
designed to support the evaluation of 3D foot motion reconstruction from monocular video.
Existing benchmarks for human motion capture either lack diversity in foot movements or are
limited to a single controlled indoor environment, making it difficult to assess foot pose
reconstruction in isolation. MOOF addresses this gap by focusing specifically on complex
foot articulations.
The dataset comprises 41 videos of 15 subjects (9 female, 6 male),
captured at 30 fps. Video durations range from 4 to 37 seconds, totaling 14,589 frames.
Recordings include subjects performing movements with complex foot motions — such as ankle
circles, ankle stretches, and heel–toe walking — and are augmented with in-the-wild dance
and ballet videos collected online.
Each video is annotated with 2D ground truth keypoints for three foot landmarks per foot —
big toe, small toe, and heel — obtained via a semi-automatic annotation pipeline.
Evaluation on MOOF uses foot-specific metrics: PCKF (Percentage of Correct foot Keypoints)
and N-FKE2d (Normalized 2D Foot Keypoint Error).
MOOF is available for non-commercial research purposes only. To request access,
please download and complete the Data Use Agreement (DUA) below, then send the signed document to
moof-dataset@tnt.uni-hannover.de.
You will receive a download link once your request has been reviewed.
If you use the MOOF dataset in your research, please cite the following paper.
For more details on the method, visit the
FootMR project page.
MOOF Dataset
Download
Citation
@InProceedings{wehrbein26footmr,
author = {Wehrbein, Tom and Rosenhahn, Bodo},
title = {Improving 3D Foot Motion Reconstruction in Markerless Monocular Human Motion Capture},
booktitle = {International Conference on 3D Vision (3DV)},
year = {2026},
}