Award details

Enriching MGnify Genomes to capture the full spectrum of the microbiota and bolster taxonomic classifications

ReferenceBB/V01868X/1
Principal Investigator / Supervisor Dr Robert Finn
Co-Investigators /
Co-Supervisors
Mr Anthony Burdett, Dr Guy Cochrane
Institution EMBL - European Bioinformatics Institute
DepartmentGenome Assembly and Annotation
Funding typeResearch
Value (£) 929,225
StatusCurrent
TypeResearch Grant
Start date 03/03/2022
End date 02/03/2025
Duration36 months

Abstract

Three major new areas of activity are proposed to enrich MGnify and meet the evolving demands of microbiome research: (i) improve the MGnify bacterial genomes and enable their incorporation into the Genome Taxonomy database (GTDB); (ii) develop pipelines to facilitate the recovery of Eukaryotic genomes; (iii) identify and annotate viruses found in MGnify assemblies to enrich MGnify genomes. This proposal also describes significant updates to the MGnify analysis pipelines and the infrastructure underpinning the resource. To achieve this we will undertake the following key developments: 1. Incorporate the latest biological information by updating the reference DB used in the MGnify analysis pipelines and the associated FAIR workflow descriptions. 2. Develop and apply an improved profile HMM library for the detection of CAZymes by utilising metagenomic sequences so as to improve their sensitivity. These will be integrated into an annotation system that will also help to detect polysaccharide utilisation loci. 3. Extend client side validation tools and interfaces to enable easier submission of metagenomics datasets, including MAGs, and enrich internal access and control mechanisms between ENA and MGnify. 4. Assemble a pipeline that extends beyond the standard single copy marker genes to facilitate the systematic detection of contaminating contigs within MAGs, to produce a refined set of prokaryotic MAGs. 5. Co-develop a cloud based framework to generate the non-redundant set of MGnify MAGs and the GTDB taxonomy, and extend GTDB to incorporate MAGs, thus accurately reflecting the taxonomic diversity of prokaryotes. 6. Initiate a collection of Eukaryotic MAGs by developing a novel binning and refinement workflow. 7. Systematically detect and cluster viral sequences, enriching them with taxonomy, functional annotations and environmental metadata to produce a viral catalogue. Use computational methods to link phages to bacterial hosts, thereby connecting catalogues.

Summary

Microbes (viruses, bacterial and single celled eukaryotes) are ubiquitous in nature and perform key roles essential to sustain life, e.g. oxygenation of the planet by marine microbes, soil nutrient cycling to support plant growth or facilitating animal digestion, especially human. Increasing knowledge about microbial ecosystems has accompanied a broadening scope of environments analysed, such as anaerobic digesters, food production systems and the built environment (extending as far as the International Space Station). Metagenomics is a culture independent method that applies modern DNA sequencing technologies to study the genomes of the organisms present in a microbiome. The latest approaches combine advanced sequencing technologies, throughput, and bioinformatics techniques to enable the assembly of short DNA fragments (produced by sequencing machines) into larger chromosomal fragments. Subsequently, these fragments are classified into sets belonging to an individual species, i.e. metagenome assembled genomes (MAGs). While the first MAG was reported in 2004, the first large-scale study applying these techniques was published only in 2015. Since then, there has been an explosion in the number of MAGs reported, which not only provides novel insights into the ~99% of organisms yet to be experimentally cultured but also dramatically expands the Tree of Life. In addition to capturing biodiversity of microbes, these MAGs facilitate a genome centric understanding of their functional role within the community, and how they interact with each other and their surroundings. A substantive section of applied research leverages these findings to restore perturbed microbiomes to a healthy state or to harness the enzymes they encode. This proposal focuses on MGnify, a resource that already performs four major roles in microbial community research: (i) it facilitates the capture of petabytes of sequence data being generated currently; (ii) it provides users access to the computational resources to conduct metagenomic assembly; (iii) it generates new knowledge by analysing microbiome derived sequence data and presenting this via a website and API to the user community; (iv) it has initiated capture of prokaryotic MAGs. In this proposal, we will extend MGnify to recover Eukaryotic MAGs using innovative new methodologies and capture the viruses in the MGnify assemblies. These non-redundant catalogues of Eukaryotic and viral genomes will be used to supplement the existing MGnify genomes. To perfect the MAG generation process, we propose to develop additional pipelines that will identify and remove the contaminants found in the prokaryotic MAGs. In addition to generating high-quality MAGs that cover the entire range of microbial taxa, we will harmonise efforts with the Genome Taxonomy Database (GTDB) to ensure that this newly discovered bacterial diversity is properly represented therein, as it is one of the most widely used resources for taxonomic classification. Underpinning this, we will enhance the metagenomic sequence submission systems to better cater for all data types and improve the internal mechanisms for data exchange, so that MGnify can perform submission on behalf of the users and gain access to all data types, whether the data is public or private (prepublication), given the appropriate user consent. Finally, in addition to updating the reference databases in our analysis pipelines, we will also improve the annotation of carbohydrate metabolism enzymes, which are poorly represented in databases currently. Collectively, these developments will reinforce MGnify's crucial importance to the microbiome research community. It will serve as the foundational knowledgebase that propels integrative microbiome research and its translation to real world applications.
Committee Research Committee B (Plants, microbes, food & sustainability)
Research TopicsMicrobiology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file