Award details

EnteroBase: A Powerful, User-Friendly Online Resource for Analyzing and Visualizing Genomic Variation within Escherichia coli and Salmonella enterica

ReferenceBB/L020319/1
Principal Investigator / Supervisor Professor Mark Achtman
Co-Investigators /
Co-Supervisors
Dr Andrew Millard, Professor Mark Pallen, Dr Martin Sergeant
Institution University of Warwick
DepartmentWarwick Medical School
Funding typeResearch
Value (£) 1,005,462
StatusCompleted
TypeResearch Grant
Start date 01/08/2014
End date 15/10/2019
Duration62 months

Abstract

EnteroBase will present a scalable structured, curated database containing data from 100,000s of genomes and their temporal and geographic metadata from ourselves, our users and public databases. It will support analyses ranging from 7-gene multi-locus sequence typing (MLST) to whole genomes. EnteroBase databases will only include high quality sequences from E. coli and S. enterica but EnteroTools will also support analyses of genomic data from other bacterial groups. The public interface to EnteroBase will be a customised instance of Galaxy, which is a powerful, but flexible, web-based sequence analysis and workflow management system. Initially, we will adopt Galaxy's existing graphical user interface and existing tools in order to port basic components from our xBASE and MLST facilities. Subsequently, we will enhance Enterobase's capabilities with EnteroTools, a set of open-source user-friendly Galaxy tools, compatible with both current and future data formats. We will incorporate other resources, such as MEGA and BIGSdb, include links to access specialised external databases for identifying repetitive and mobile elements, and encourage cloud-sourcing of novel solutions by letting users publish their work-flows. EnteroTools will allow users to: ->upload and analyse sequence reads, assemble and annotate genomes and align whole genomes or genes. ->visualise relationships between bacterial genotypes; drill down to genotype clusters; perform population genetics and real-time epidemiological analyses. ->evaluate and visualise the contributions of SNPs, indels, transpositions, recombination and selection, as well as details of changes in the core and accessory genomes. ->access processed data easily in the context of associated metadata. including bidirectional links between metadata in the genomic and MLST databases, thus providing a facility for scanning the metadata from genetically related isolates that share MLST or rMLST alleles.

Summary

It is hard to think of two organisms that are more important to scientists, policy makers and the public than E. coli and S. enterica. Both have been studied extensively in the laboratory as models of how bacterial cells function, behave and evolve. However, both are also important causes of human and animal INFECTION and are seldom out of the news, particularly given their propensity to cause outbreaks. The E. coli outbreak that hit Germany in 2011, with >4,000 cases and >50 deaths, amply illustrates the power of these organisms to devastate even a wealthy advanced society. In 2013, Salmonella gained media coverage in England when >200 people fell ill after a spice festival in Newcastle. It is important to recognise that no single strain can capture the essence of either species. Instead, what we see in nature is a riotous profusion of diversity. For example, some strains of E. coli live harmlessly in our bowels, while others cause diarrhoea, urinary tract infection or even bloodstream infection. Two E. coli strains may differ by 1/3 of their genetic make-up (genome). Both Salmonella and E. coli undergo relentless evolution, including spread of ANTIBIOTIC RESISTANCE. The huge diversity already present, twinned with ongoing evolution and spread of new lineages creates tremendous problems for microbiologists and other scientists as well as policy makers in recognising and classifying strain types. Yet such classification into well-defined, scientifically robust populations is essential before scientific, clinical or even political conclusions can be generalised across sub-types or species. Fortunately, we have been presented with an exciting new opportunity to capture and analyse within-species diversity in bacteria in the form of HIGH-THROUGHPUT SEQUENCING, a set of innovative technologies that make bacterial genome sequencing (a process of capturing all the DNA sequences within the cell) easier, cheaper and quicker than ever before. However, this sudden availability of new data creates a fresh challenge-the DRINKING-FROM-A-FIRE-HOSE problem-namely how to store, visualise and analyse all the new data on genomic diversity generated by this exciting new technology. In addition, while expert bioinformaticians can use command-line tools to analyse genomes, lab-based bacteriologists are dependent on the creation of new user-friendly web-based resources, if they are not to miss out on this exciting new opportunity. To address this problem we will create a new, powerful but user-friendly online database called ENTEROBASE, which will act as a one-stop shop for anyone interested in analysing and visualising genetic diversity in E. coli and Salmonella. EnteroBase will incorporate ENTEROTOOLS, a set of modular, open-source, web-based tools compatible with data formats and standards from both current and future sequencing technologies. Together, these two resources will allow bacteriologists who work in the laboratory and lack high-level computer skills to perform incisive and sophisticated computer-based analyses of bacterial DNA sequence data. Users will be able to upload and analyse their own data, as well as exploit the cumulative knowledge of the microbiology community, not just to look at global patterns of diversity within these species but also to perform speedy, near-real-time analyses of ongoing or recent outbreaks. Principal investigator Achtman has spearheaded efforts to replace outdated 19th- and 20th-century approaches to the typing and classification of these bacteria with more modern approaches; co-investigator Pallen has applied innovative approaches to analyse the German E. coli outbreak. Both will bring to this project 1000s of users of previous similar, well-established but less powerful databases. This project will also help maintain and enhance the UK skills base and make our country the destination of choice for the brightest and best scientists.

Impact Summary

The proposed project will benefit anyone in the UK or overseas academic sector with an interest in E. COLI OR SALMONELLA AS PATHOGENS OR MODEL ORGANISMS (including those interested in systems biology or synthetic biology), or with interests in bacterial genome evolution or population genetics or epidemiology. More generally, the resource we create here will be of interest to ANYONE INTERESTED IN EXPLOITING COMPARATIVE SEQUENCE DATA from any bacterial species. We anticipate bringing across 1000s of users of our existing MLST and xBASE facilities to this new resource. The proposed project will benefit anyone within the commercial private sector who is interested in developing NEW DRUGS, VACCINES OR DIAGNOSTIC TESTS for E. coli or Salmonella. Industrial users could benefit from using EnteroBase to explore genotypic--and by implication phenotypic--diversity within these species when evaluating novel vaccine or drug targets. EnteroBase will allow users to explore how ANTIMICROBIAL RESISTANCE EVOLVES AND SPREADS within these species. Similarly, companies that sell sequencing technologies stand to benefit from exploitation of and demand for high-throughput sequence data (both Solexa and Oxford nanopore sequencing were developed within the UK, with benefits to our economy). The delineation of epidemic or highly pathogenic lineages is of KEY INTEREST TO POLICY MAKERS, whether addressing FOOD SECURITY, FOOD SAFETY, HUMAN HEALTHCARE, HEALTH AND SAFETY AT WORK OR BIOTERRORISM (note that certain E. coli and Salmonella lineages are even defined within the UK's Anti-terrorism, Crime and Security Act 2001). EnteroBase will also assist in increasing the effectiveness of public services and policy by facilitating analyses that will GROUND POLICY DECISIONS IN A SOLID UNDERSTANDING of bacterial evolution, epidemiology, population genetics and taxonomy. The UK food industry needs detailed knowledge about the diversity and sources of Salmonella infection, such as the Agona outbreak that spread to the UK via products from an Irish food producer. Achtman has been at the forefront of efforts to replace classification of these bacteria by serovar with a MORE RATIONAL AND DISCRIMINATORY SYSTEM OF CLASSIFICATION; these efforts are likely to lead to changes in international regulations governing Salmonella and E. coli infections in animals impacting on the human food chain. Our analyses already influence the policies of organizations such as the eCDC (Stockholm), which coordinates European efforts to stop outbreaks of salmonellosis and Listeriosis. EnteroBase will help microbiologists, bioinformaticians, epidemiologists and population geneticists to integrate bacterial genomics with epidemiological disease patterns and to elucidate genetic relationships between S. enterica and E. coli from domestic animals and human patients, with IMPACTS ON DISEASE PREVENTION, MANAGEMENT OF INFECTION AND QUALITY OF LIFE. Obvious beneficiaries within the public sector include those employed in the HEALTH SERVICES, including the NHS and Public Health England, who will gain an improved understanding of the links between population biology, taxonomy and diagnosis/prognosis for these species. The proposed resource will also enhance the UK'S REPUTATION AS A CENTRE OF EXCELLENCE, attracting highly skilled students, academics and collaborators from foreign countries. The research and professional skills in bioinformatics gained by staff working on the project will help ADDRESS THE NATIONAL SKILLS SHORTAGE in this area; similarly, the training provided more widely as part of the project will help improve bioinformatics and genomics skills among UK bacteriologists.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsMicrobiology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file