TNT15 Dataset

TNT members involved in this project:
Prof. Dr.-Ing. Bodo Rosenhahn
Timo von Marcard, M.Sc.
Show all

Multimodal Motion Capture Dataset (TNT15)

Timo v. Marcard, Gerard Pons-Moll, and Bodo Rosenhahn

Video-based human motion capture has been a very active research area for decades now. The articulated structure of the human body, occlusions, partial observations and image ambiguities makes it very hard to accurately track the high number of degrees of freedom of the human pose. Recent approaches have shown, that adding sparse orientation cues from Inertial Measurement Units (IMUs) helps to disambiguate and improves full-body human motion capture. As a complementary data source, inertial sensors allow for accurate estimation of limb orientations even under fast motions.
In the research landscape of marker-less motion capture, publicly available benchmarks for video-based trackers (e.g. HumanEva, Human3.6M) generally lack inertial data. One exception is the MPI08 dataset, which provides inertial data of 5 IMUs along with video data.
This new dataset, called TNT15, consists of synchronized data streams from 8 RGB-cameras and 10 IMUs. In contrast to MPI08 it has been recorded in a normal office room environment and the high number of 10 IMUs can be used for new tracking approaches or improved evaluation purposes.

The TNT15 dataset consists of
  • video data: multi-view sequences obtained from 8 calibrated RGB-cameras.
  • silhouettes: binary segmented images obtained by background subtraction.
  • IMU data: orientation and acceleration data of 10 IMUs.
  • projection matrices: camera parameters of all 8 cameras.
  • meshes: 3D laser scans and registered meshes of each actor.

3D scans

8 RGB-cameras

10 IMUs

The data recordings comprise four actors performing five activities:
  • walking,
  • running on the spot,
  • rotating arms,
  • jumping and skiing exercises,
  • dynamic punching.
In total, the dataset contains more than 4:30 minutes of video data, which amounts to almost 13 thousand frames at a frame rate of 50 Hz. Note that the videos are compressed with high quality, however if you would like to obtain the sequences as lossless png images please contact

The TNT15 dataset used in the TPAMI article is freely available for your own tests and experiments. However it is restricted to research purposes only. If you use this data, please acknowledge the effort that went into data collection by citing the corresponding article Human Pose Estimation from Video and IMUs BibTeX.

Detailed dataset description: TNT15_documentation.pdf.
The full dataset can be downloaded by saving this zip-File:

  • Journals
    • Timo von Marcard, Gerard Pons-Moll, Bodo Rosenhahn
      Human Pose Estimation from Video and IMUs
      Transactions on Pattern Analysis and Machine Intelligence, IEEE, Vol. 38, No. 8, pp. 1533-1547, January 2016