In the course of the last decades the cost of genome sequencing decreased significantly, resulting in an explosion in data generation with the volume of annually generated data soon surpassing other big data domains such as video and astronomy. To reduce space required for genomic data, state-of-the-art methods comprised of transformations and entropy coding are used. For sequencing experiments, one important step is to arrange the sequences of DNA, RNA, or protein to identify regions of similarity that have similar functional, structural, or evolutionary relationships between the sequences. For each alignment or arrangement, a probability is assigned. In order to reduce the data size, a directed acyclic graph based transformation / compression should be developed.
Basic in source coding, C++ or Python programming skill, interest in the field of bioinformatic and data compression.
Contact person: Yeremia G. Adhisantoso