Bioinformatics

Head of Department: Dr. Boyke Bunk

Bioinformatics, formerly designated as a key discipline for analysis of nucleotide sequence data, meanwhile covers a much broader range of databases and tools suitable for analysis and understanding of biological data in general. As a result of massive advances within the omics sector, especially within the field of Next Generation Sequencing, (semi-) automated tools and pipelines are needed to cope with data generation.

The DSMZ maintains its own well-established server infrastructure, which is continuously augmented according state-of-the-art technical expertise. The server infrastructure consists of an HPC SMP system UV2000 consisting of 8 nodes, 128 CPU cores and 4 TB of RAM. This part of the server system is used predominantly for massive parallel diversity estimations with high memory usage and taxonomic (in-)dependent analyses in the context of the Biodiversity Exploratories. Hereby, in house short read sequencers as the Illumina NextSeq 500 and MiSeq are being used.

The HPC SMP system is accompanied by two dedicated server clusters of each 16 nodes, 6 TB of RAM and 640 CPU cores leading to a total computing power of 18 TFLOPs, which it is used, e.g., for high-throughput pairwise genome comparisons and secondary analysis of Pacific Biosciences sequencing data. A Pacific Biosciences RSII sequencing platform is established and running successfully at DSMZ since January 2012, which is supported by a Pacific Biosciences Sequel II Platform with 100 times higher throughput. Currently, the server infrastructure encompasses a central HPC storage of 4.5 PB. This additional high-performance storage allows for direct processing of sequencing data without need for manual data transfer to the server clusters.

The DSMZ bioinformatics department is involved in the Leibniz Omics- Network LiON and firmly connected to in house research topics. It comprises four main topics: