Braunschweig- January 14, 2015: The colon bacillus Escherichia coli is one of the best studied model organisms in the life sciences. However, the reference organism for this species, its so-called type strain, has been overlooked in microbial genomics until now. In the “Genomic Encyclopedia of Bacteria and Archaea” (GEBA) project, the DNA of type strain DSM 30083T has now been sequenced and compared to that of close relatives of the strain. This study not only allows an entirely new view of the numerous E. coli strains that play relevant roles in medicine and biotechnology, including the EHEC pathogen and Shigella, but they also yielded a generally applicable method for determining the subspecies of any bacterial species. The research was conducted at the Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany, and at the Joint Genome Institute, Walnut Creek, CA, USA.
The colon bacterium Escherichia (E.) coli to microbiologists and biotechnologists is like a “pet bacterium” and looks back on an exciting history. Initially described as "Bacterium coli commune" by bacteriologist Theodor Escherich in 1886, its original isolate was lost at the beginning of the 1920s. It was not until 1941 that it was isolated again, this time by Fritz Kaufmann at the State SerumInstitute inCopenhagen, Denmark, who also deposited it in in several collections of microbial strains and provided a scientific description. Today, E. coli is likely the best understood microorganism in the world and serves as an important indicator for the quality of drinking and recreational waters.
"It seems strange that the number one, the type strain of a bacterium that has entire scientific conferences dedicated to it as a model organism, had not been fully sequenced until now", said Christine Rohde, Head of the E. coli strain collection at DSMZ, Braunschweig, Germany. "Initially, scientists primarily sequenced the genomes of pathogenic strains of E. coli, or of genetically modified strains of biotechnological relevance. In addition, physicians and hygienists in their daily practice use serotypes that are quickly determined by antibody tests in order to differentiate between different strains of E. coli.” As Markus Göker, a bioinformatics scientist at DSMZ added:“Complete bacterial genomes are of fundamental importance for diagnostics in humans, for biotechnology, and for the search for antimicrobial agents. Today, this is truer than ever, as some strains of E. coli have developed into dangerous pathogens such as EHEC or EAHEC. The E. coli type strain was sequenced as part of the GEBA project that focuses on type strains exhibiting an unusual physiology or occupying a key place in the phylogenetic tree. This is the only microorganism in the project that was included based on its importance as a model organism.”
A genome with pathogenic potential
There are major physiological and genomic differences between the E. coli type strain and the harmless laboratory strain K-12. “Due to its serotype, the type strain had been grouped into the biological containment level 2, and its genome sequence now confirmed its pathogenic potential, “ said Jörn Petersen, an expert of plasmid biology at the DSMZ.“Unlike laboratory strain K-12, the E. coli type strain harbors an additional circular plasmid of 131,289 base pairs in its genome of 5,038,133 base pairs; this plasmid exhibits a sequence identity of 99% with plasmids from pathogenic E. coli isolates. These strains cause, e.g., colibacillosis in poultry and meningitis in newborns, with the horizontally transferable plasmid being responsible for their virulence,” explained Petersen.
Sophisticated computer-aided phylogenetic analysis
Thanks to the complete genome sequence of the E. coli type strain, the Braunschweig scientists were able to examine whether the huge number of previously sequenced isolates of E. coli actually belong to the same species, using modern taxonomic techniques in the process. “To this end, we analyzed more than 250 strains of E. coli and also verified their published taxonomic classification in subgroups, the 'phylotypes'. This bioinformatics-based analysis was performed with the state-of-the-art GGDC method. This technique is analogous to classical DNA-DNA hybridization in the laboratory, but yields significantly more exact results," as Markus Göker explained.
The analysis confirmed that all sequenced strains of E. coli belong to the same species. What is new, however, is the realization that E. coli is to be classified as having several subspecies. One of these subspecies includes all strains of the genus Shigella, known to cause shigellosis. “However, the name Shigella has historically been established in medicine, so we were not striving for taxonomic changes in this case,” Markus Göker added. “What is much more important is that the techniques tested in E. coli can now been used to classify bacterial species into subspecies in general.”
Meier-Kolthoff JP et al. (2014). Complete genome sequence of DSM 30083T, the type strain (U5/41T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Stand Genomic Sci 9: 2 (http://dx.doi.org/10.1186/1944-3277-9-2)
Peigne C et al. (2009). The plasmid of Escherichia coli strain S88 (O45:K1:H7) that causes neonatal meningitis is closely related to avian pathogenic E. coli plasmids and is associated with high-level bacteremia in a neonatal rat meningitis model. Infect Immun 77: 2272-2284 (http://dx.doi.org/10.1128/IAI.01333-08).
Meier-Kolthoff JP et al. (2013). Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14: 60 (http://dx.doi.org/10.1186/1471-2105-14-60).
The Genome to Genome Distance Calculator provides a bioinformatics-based approach for calculating the distances and similarities of genome sequences. These can be used to create phylogenetic trees, but they also allow a mathematical transformation that replaces traditional DNA-DNA hybridization techniques. Using this method, bacteria can be classified into species (and, thanks to the E. coli study, into subspecies as well). GGDC is available as a web-based service at http://ggdc.dsmz.de.
The GEBA (Genomic Encyclopedia of Bacteria and Archaea) project and its successor projects aim at using genome sequencing to systematically close the gaps that still exist in the microbial branches of the phylogenetic tree of life. DSMZ is working on this project in close collaboration with the Joint Genome Institute in California, USA. At DMSZ, Markus Göker heads up these projects. Annotated genomes are deposited via the “GenBank” portal (http://www.ncbi.nlm.nih.gov/genbank/) and can be interactively accessed at https://img.jgi.doe.gov/cgi-bin/w/main.cgi.
Image 1: Scanning electron microscopic image of cells of E. coli type strain DSM 30083T (courtesy of Manfred Rohde, Helmholtz Centre for Infection Research; Christine Rohde, Leibniz Institute DSMZ)
Image 2: Scan of the original deposit record of E. coli type strain U5/41T (= DSM 30083T) from the Lausanne Culture Collection