A publicly accessible database for Clostridioides difficile genome sequences supports tracing of transmission chains and epidemics

  • Clostridioides difficile is the primary infectious cause of antibiotic-associated diarrhea. Local transmissions and international outbreaks of this pathogen have been previously elucidated by bacterial whole-genome sequencing, but comparative genomic analyses at the global scale were hampered by the lack of specific bioinformatic tools. Here we introduce a publicly accessible database within EnteroBase (http://enterobase.warwick.ac.uk) that automatically retrieves and assembles C. difficile short-reads from the public domain, and calls alleles for core-genome multilocus sequence typing (cgMLST). We demonstrate that comparable levels of resolution and precision are attained by EnteroBase cgMLST and single-nucleotide polymorphism analysis. EnteroBase currently contains 18254 quality-controlled C. difficile genomes, which have been assigned to hierarchical sets of single-linkage clusters by cgMLST distances. This hierarchical clustering is used to identify and name populations of C.Clostridioides difficile is the primary infectious cause of antibiotic-associated diarrhea. Local transmissions and international outbreaks of this pathogen have been previously elucidated by bacterial whole-genome sequencing, but comparative genomic analyses at the global scale were hampered by the lack of specific bioinformatic tools. Here we introduce a publicly accessible database within EnteroBase (http://enterobase.warwick.ac.uk) that automatically retrieves and assembles C. difficile short-reads from the public domain, and calls alleles for core-genome multilocus sequence typing (cgMLST). We demonstrate that comparable levels of resolution and precision are attained by EnteroBase cgMLST and single-nucleotide polymorphism analysis. EnteroBase currently contains 18254 quality-controlled C. difficile genomes, which have been assigned to hierarchical sets of single-linkage clusters by cgMLST distances. This hierarchical clustering is used to identify and name populations of C. difficile at all epidemiological levels, from recent transmission chains through to epidemic and endemic strains. Moreover, it puts newly collected isolates into phylogenetic and epidemiological context by identifying related strains among all previously published genome data. For example, HC2 clusters (i.e. chains of genomes with pairwise distances of up to two cgMLST alleles) were statistically associated with specific hospitals (P<10−4) or single wards (P=0.01) within hospitals, indicating they represented local transmission clusters. We also detected several HC2 clusters spanning more than one hospital that by retrospective epidemiological analysis were confirmed to be associated with inter-hospital patient transfers. In contrast, clustering at level HC150 correlated with k-mer-based classification and was largely compatible with PCR ribotyping, thus enabling comparisons to earlier surveillance data. EnteroBase enables contextual interpretation of a growing collection of assembled, quality-controlled C. difficile genome sequences and their associated metadata. Hierarchical clustering rapidly identifies database entries that are related at multiple levels of genetic distance, facilitating communication among researchers, clinicians and public-health officials who are combatting disease caused by C. difficile.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Martinique Frentrup, Zhemin Zhou, Matthias Steglich, Jan P. Meier-KolthoffORCiDGND, Markus Göker, Thomas Riedel, Boyke Bunk, Cathrin Spröer, Jörg Overmann, Marion Blaschitz, Alexander Indra, Lutz von Müller, Thomas A. Kohl, Stefan Niemann, Christian Seyboldt, Frank Klawonn, Nitin Kumar, Trevor D. Lawley, Sergio García-Fernández, Rafael Cantón, Rosa del Campo, Ortrud Zimmermann, Uwe Groß, Mark Achtman, Ulrich Nübel
URN:urn:nbn:de:bvb:384-opus4-1066902
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/106690
ISSN:2057-5858OPAC
Parent Title (English):Microbial Genomics
Publisher:Microbiology Society
Place of publication:London
Type:Article
Language:English
Year of first Publication:2020
Publishing Institution:Universität Augsburg
Release Date:2023/08/07
Tag:General Medicine
Volume:6
Issue:8
DOI:https://doi.org/10.1099/mgen.0.000410
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Biomedizinische Informatik, Data Mining und Data Analytics
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)