Exploring the microdiversity with PacBio Sequel 16S-ITS sequencing

The PacBio Sequel can generate up to a million of 2 kb long reads with >99% accuracy in circular consensus sequencing mode. These reads can cover the entire bacterial 16S, ITS and even part of the 23S. Therefore, compared to the partial 16S coverage provided by the Illumina sequencing platform, the PacBio Sequel makes it possible without any sequence assembly to investigate microdiversity, e.g. subspecies diversity that is hidden from the 16S phylogeny alone. However, this possibly has still not been thoroughly evaluated until now. Within this project, both mock communities and environmental samples are sequenced with different protocols and analyzed with different bioinformatics workflows. The goal is to give a comprehensive overview about the applicability and optimal workflows of such sequencing strategy.

Functional Characterization of bacterial genomes

Complete bacterial genomes and metagenomic bins are generated frequently in DSMZ. One of the main focuses is the genomes of Acidobacteria. Currently, several alphaproteobacterial and gammaproteobacterial genomes are under investigation. Also, bacterial metagenomic bins from algal communities have been extracted and their metabolic genes examined. The goal is to identify microbes and genes that are essential for their symbiotic life styles.

Machine learning in historic biological data

Biological data are gathered in public databases such as those comprised within the International Nucleotide Sequence Database Collaboration (INSDC) and BacDive for decades. These historic data, when carefully analyzed, can reveal trends and further useful information about biological research. Currently, on the one hand, sequences from INSDC have been collected and their metadata analyzed to evaluate the impact of Nagoya Protocol on omics-based research. On the other hand, machine learning methods have been applied to integrate sequence data from INSDC and the metabolic data from BacDive.