Compression of DNA Sequencing Data

TNT members involved in this project:
Yeremia G. Adhisantoso, M.Sc.
Dr.-Ing. Marco Munderloh
Prof. Dr.-Ing. Jörn Ostermann
Dipl.-Ing. Jan Voges

Over the past years technological advances in DNA sequencing have led to faster and more cost-efficient approaches to sequence individual genomes and other genomic samples. Because of the enormous amount of sequencing data generated the processing, storage and analysis of sequencing data entails novel challenges for the scientific community. New processes and tools have to be developed to overcome the current limitations in terms of storage space, processing speed and many more. Our goal is to develop novel algorithms to enhance data processing "from the tissue to the hard drive".

In the scope of this project we actively contribute to the series of MPEG-G standards (ISO/IEC 23092). More information is available on the MPEG-G website.

Show all publications
  • Jan Voges, Mikel Hernaez, Marco Mattavelli, Jörn Ostermann
    An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data
    Proceedings of the IEEE, Vol. 109, No. 9, pp. 1607-1622, 2021
  • Jan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S. Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez
    GABAC: an arithmetic coding solution for genomic data
    Bioinformatics, Vol. 36, No. 7, pp. 2275-2277, 2020
  • Idoia Ochoa, Hongyi Li, Florian Baumgarte, Charles Hergenrother, Jan Voges, Mikel Hernaez
    AliCo: A New Efficient Representation for SAM Files
    2019 Data Compression Conference (DCC), pp. 93-102, 2019