TNT logo LUH TNT

Omics data processing and storage

TNT members involved in this project:
Dr.-Ing. Marco Munderloh
Prof. Dr.-Ing. Jörn Ostermann
Dipl.-Ing. Jan Voges

Over the past years technological advances in sequencing - the process of reading out genomic information from biological samples - have led to a faster and more cost-efficient approach to sequence individual genomes and other genomic samples. Because of the enormous amount of sequencing data generated the processing, storage and analysis of sequencing data entails novel challenges for the scientific community. New processes and tools have to be developed to overcome the current limitations in terms of storage space, processing speed and many more.

Sequencing data passes through a great number of different analysis steps. Our goal is to develop novel algorithms to enhance data processing "from the tissue to the hard drive".

In the scope of this project we actively contribute to the development of the MPEG-G standard (ISO/IEC 23092). More information is available on the MPEG-G website.

If you are interested in writing your thesis and thereby in contributing to this project please contact Jan Voges.

Show recent publications only
  • Journals
    • Claudio Alberti, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez
      An introduction to MPEG-G, the new ISO standard for genomic information representation
      bioRxiv, Cold Spring Harbor Laboratory, September 2018
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      Bioinformatics, Oxford University Press, Vol. 34, No. 10, pp. 1650-1658, May 2018, edited by Bonnie Berger
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-level Scheme for Quality Score Compression
      Journal of Computational Biology, Mary Ann Liebert, Inc., Vol. 25, 2018
    • Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S Cenk Sahinalp
      Comparison of high-throughput sequencing data compression tools
      Nature Methods, Nature Publishing Group, Vol. 13, No. 12, pp. 1005-1008, October 2016
  • Conference Contributions
    • Jan Voges
      MPEG-G: The Standard for Genomic Information Representation
      Proceedings of the 4th Summer School on Video Compression and Processing (SVCP) 2018, Leibniz Universität Hannover, Institut für Informationsverarbeitung, pp. 7-8, Hannover (DE), July 2018, edited by Jan Voges
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-Level Scheme for Quality Score Compression
      Proceedings of the 10th International Conference on Bioinformatics and Computational Biology (BICOB 2018), International Society for Computers and their Applications (ISCA), pp. 161-167, Las Vegas, NV (US), March 2018, edited by Hisham Al-Mubaid, Qin Ding, Oliver Eulenstein
    • Ana A Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Lossy Compression of Quality Scores in Differential Gene Expression: A First Assessment and Impact Analysis
      2018 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 167-176, Snowbird, UT (US), March 2018
    • Jan Voges, Jörn Ostermann
      MPEG-G: The Emerging Standard for Genomic Data
      Poster abstracts of the 25th German Conference on Bioinformatics (PeerJ Preprints), PeerJ, Vol. 5, p. 2, Tübingen (DE), September 2017
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      F1000Research (Presented at: Joint 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB) 2017), International Society for Computational Biology (ISCB), Vol. 6, p. 1382 (poster), Prague (CZ), August 2017
    • Ana A Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Differential Gene Expression with Lossy Compression of Quality Scores in RNA-Seq Data
      2017 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 444-444, Snowbird, UT (US), April 2017
    • Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L Goldfeder, Ana A Hernandez-Lopez, Marco Mattavelli, Bonnie Berger
      An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values
      2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 221-230, Snowbird, UT (US), April 2016
    • Jan Voges, Marco Munderloh, Jörn Ostermann
      Predictive Coding of Aligned Next-Generation Sequencing Data
      2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 241-250, Snowbird, UT (US), April 2016
  • Standardisation Contributions
    • Jan Voges, Idoia Ochoa, Mikel Hernaez
      Proposed Updates to the MPEG-G Genomic Information Database
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M44035, Macao (MO), October 2018
    • Jan Voges, Shubham Chandak, Mikel Hernaez
      Study on ISO/IEC DIS 23092-2
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M44049, Macao (MO), October 2018
    • Jan Voges
      Study of ISO/IEC CD 23092-3
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M42919, Ljubljana (SI), July 2018
    • Jan Voges, Idoia Ochoa, Mikel Hernaez
      Update to the genomic information database
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M42920, Ljubljana (SI), July 2018
    • Jan Voges
      Study of ISO/IEC 23092-2
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M42305, San Diego, CA (US), April 2018
    • Jan Voges
      Quality value coding and functional equivalence of genomic analysis pipelines
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M42306, San Diego, CA (US), April 2018
    • Jan Voges
      Study on White paper on the objectives and benefits of the MPEG-G standard (Draft 1)
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M41956, Gwangju (KR), January 2018
    • Massimo Ravasi, Daniel Naro, Junaid Ahmad, Jan Voges
      Current status of MPEG-G reference software implementation
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M42083, Gwangju (KR), January 2018
    • Claudio Alberti, Jan Voges
      Study on ISO/IEC 23092-2 WD 3
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M41393, Macao (CN), October 2017
    • Jan Voges
      Core Experiment 5 on Genomic Information Representation results LUH
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40804, Turin (IT), July 2017
    • Jan Voges
      Proposed changes for ISO/IEC 23092-2 WD 2
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40860, Turin (IT), July 2017
    • Jan Voges
      A Rate-Distortion Analysis of Sequencing Quality Value Compression
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40861, Turin (IT), July 2017
    • Jan Voges, Mikel Hernaez
      Core Experiment 2 on Genomic Information Representation results LUH/Stanford/UIUC
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40224, Hobart (AU), April 2017
    • Jan Voges, Claudio Alberti, Mikel Hernaez, Tom Paridaens, James Bonfield, Paolo Ribeca, Jaime Delgado
      Unified representation of sequencing quality values
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40222, Hobart (AU), April 2017
    • Jan Voges
      Summary of Core Experiment 2 on Genomic Information Representation
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40223, Hobart (AU), April 2017
    • Claudio Alberti, Jan Voges, Giorgio Zioa
      Unified representation of genomic reads
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M40277, Hobart (AU), April 2017
    • Jan Voges
      Core Experiment 2 summary
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39663, Geneva (CH), January 2017
    • Jan Voges, Mikel Hernaez
      Core Experiment 2 results LUH-Stanford/UIUC
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39664, Geneva (CH), January 2017
    • Jan Voges
      Core Experiment 1 results LUH
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39665, Geneva (CH), January 2017
    • Jan Voges
      Core Experiment 3 results LUH
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39666, Geneva (CH), January 2017
    • Jan Voges
      Core Experiment 1 cross-check SFU
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39748, Geneva (CH), January 2017
    • Jan Voges
      Core Experiment 2 cross-check SFU
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39749, Geneva (CH), January 2017
    • Jan Voges
      Core Experiment 3 cross-check SFU
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M39750, Geneva (CH), January 2017
    • Jan Voges, Mikel Hernaez, Idoia Ochoa, Ana Angelica Hernandez-Lopez, Claudio Alberti, Marco Mattavelli, Al Wegener, Dan Greenfield, Noah Daniels, S Cenk Sahinalp, James Bonfield, Bonnie Berger, Jörn Ostermann
      Benchmark framework for lossy compression of genome sequencing quality values
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M38916, Chengdu (CN), October 2016
    • Jan Voges, Mikel Hernaez, Jörn Ostermann
      Adaptive lossy compression of high-throughput sequencing quality values
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M38917, Chengdu (CN), October 2016
    • Jan Voges, Marco Munderloh, Jörn Ostermann
      Reference-free compression of aligned high-throughput sequencing data
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M38918, Chengdu (CN), October 2016
    • Jan Voges, Mikel Hernaez, Ana Angelica Hernandez-Lopez, Claudio Alberti, Marco Mattavelli, Al Wegener, Dan Greenfield, Noah Daniels, S Cenk Sahinalp, James Bonfield, Bonnie Berger
      Extensions and notes to the evaluation framework for lossy compression of genome sequencing quality values
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M38376, Geneva (CH), May 2016
    • Jan Voges
      Reference-free Compression of Aligned Next-Generation Sequencing Data
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M37678, San Diego, CA (US), February 2016
    • Ibrahim Numanagic, Faraz Hach, James Bonfield, Claudio Alberti, Jan Voges
      Review of genomic information compression tools
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M37766, San Diego, CA (US), February 2016
    • Claudio Alberti, Marco Mattavelli, Ana A. Hernandez-Lopez, Mikel Hernaez, Rachel G. Goldfeder, Noah Daniels, Jan Voges
      Proposal for the update to the database of genomic test data
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M37767, San Diego, CA (US), February 2016
    • Claudio Alberti, Marco Mattavelli, Ana A. Hernandez-Lopez, Noah Daniels, Mikel Hernaez, Idoia Ochoa, Jan Voges, Rachel Goldfeder, Daniel Greenfield
      Evaluation framework for lossy compression of genomic Quality Values
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M37768, San Diego, CA (US), February 2016
    • Jan Voges
      A Framework for the Evaluation of Genomic Information Compression
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M37061, Geneva (CH), October 2015
    • Claudio Alberti, Marco Mattavelli, Ioannis Xenarios, Nicolas Guex, Heinz Stockinger, Thierry Schuepbach, Christian Iseli, Daniel Zerzion, Ivan Topolsky, Yann Thoma, Enrico Petraglio, Mikel Hernaez, Jan Voges
      A framework for the comparison of genomic information compression tools
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M37151, Geneva (CH), October 2015
    • Jan Voges, Marco Munderloh
      Approaches to SAM File Compression
      ISO/IEC JTC 1/SC 29/WG 11, Document Number M36282, Warsaw (PL), June 2015