Award details

Building a global metagenomics portal ('MGportal') to handle next generation sequencing data and associated metadata

ReferenceBB/I02612X/1
Principal Investigator / Supervisor Dr Guy Cochrane
Co-Investigators /
Co-Supervisors
Dr Robert Finn, Ms Sarah Hunter
Institution EMBL - European Bioinformatics Institute
DepartmentSequence Database Group
Funding typeResearch
Value (£) 523,451
StatusCompleted
TypeResearch Grant
Start date 01/01/2012
End date 31/12/2014
Duration36 months

Abstract

While genomes represent the full genetic (DNA) complement of a single organism, metagenomes represent the DNA of an entire community of organisms. Interest in improved sampling of diverse environments (e.g. hosts/gut, plants, soil, etc) combined with advances in the development and application of ultra-high throughput sequence methodologies is set to vastly accelerate the pace at which new metagenomes are generated. Combined with other types of 'omic data, metagenomes hold the promise of unparalleled insights into fundamental questions across a range of fields including evolution, ecology, environment biology, health and medicine. To fully exploit the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. In this three year collaborative project we aim to build an international data resource and portal for metagenomic data at the European Bioinformatics Institute. This portal will manage the submission, storage, dissemination and mining of metagenomic data from data providers across the world. The portal will focus on the capture of rich in contextual information (metadata), working in close collaboration with the Genomic Standards Consortium (GSC) an international working body creating and implementing standards to describe genomes, metagenomes and marker gene sequences. Further, the collaborative use of the ISA Infrastructure software suite for metadata capture will enable capture and sharing of standards compliant data and integration with a range of other data types. The resulting MGPortal will be a major new resource at the EBI. The combined MGPortal Team will engage in a range of community-building activities, including hosting workshops and training activities that both educate data submitters and users and will ensure the portal develops in line with community needs.

Summary

While genomes represent the full genetic (DNA) complement of a single organism, metagenomes represent the DNA of an entire community of organisms. These organisms might be free-living in the environment, or be found on the skin or in the gut of a human being or other species. Microbial organisms play a major role in our everyday health and well-being, which is not surprising when you consider that the number of microbial cells in or on an average human body actually exceeds the number of human cells! Microbes play a similarly important role in the environment; different types of organisms live under different conditions (including extreme habitats, such as the run-off from acid mines or the depths of the oceans). Understanding how these organisms have adapted to their various living conditions will lead to a better understanding of how changes in the environment will have impact on biodiversity in the future. It may also lead to discovery of entirely new species or novel proteins which could have utility as antibiotics or other drugs. Combined with other types of 'omic data, metagenomes hold the promise of unparalleled insights into fundamental questions across a range of fields including evolution, ecology, environment biology, health and medicine. To fully exploit the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. Significant numbers of metagenomics projects have been awarded grants by international funding bodies. Whilst all of these projects have specific, scientifically-interesting aims, they mostly exist in isolation, with little or no cross-referencing to other metagenomic or genomic datasets. Our intention is to leverage existing infrastructure to deliver a world-class metagenomics resource with unique utility for UK-based metagenomics researchers. This resource, MGportal, will utilise user-friendly interfaces, state-of-the-art algorithms and the EBI's unique position as a hub of biological information to measurably enhance the value of these researchers' data. It will be built in close collaboration with the Genomic Standards Consortium (GSC). MGportal will consist of software tools to enable metagenomics researchers to upload their data to the raw nucleotide sequence archives, data analysis pipelines to predict what potential genes are present in the data and what their function is, plus a web interface which will display these data and results in a way that is easy to browse and query. We will hold training courses and a workshop to gain input from the scientific community about the portal. It is hoped that MGportal will eventually allow researchers to understand the results of their metagenomics experiments, as well as seeing how those results compare with the outcomes of other studies.

Impact Summary

The immediate beneficiaries of this project will be the British, European and Global researchers who are producing metagenomic data through the use of Next Generation Sequencing technologies. Provision of this resource will allow these users to interpret the results of their studies more fully, as they will be able to do so using the context of public datasets that are already in existence (and which are typically better annotated). The integration and adaptation of analysis software already widely used by the genomics community will provide a new repertoire of tools that the metagenomics community will be able to use with confidence. The submission of data to the public archives (which is frequently a requirement of journals if researchers wish to publish dataset-related papers) will be made easier and simpler by improving submission tools and processes. This will be the first comprehensive portal for metagenomics data in Europe, and will allow the UK/Europe to work further in collaboration with other key resources at the international level. Additionally, this project will contribute to the further training of the biological community in bioinformatics, both in the areas of metadata capture and data analysis, and in the data mining and integration of metagenomic data through use of the portal itself. This project will help to foster community-driven standards and grass-roots activities to promote data sharing. New knowledge will arise from a deeper understanding of the environmental context of the genes that are already known from previous studies of isolated organisms. Whilst in many cases it has been possible to elucidate the function of known genes, in other instances, information relating to the conditions under which these genes have evolved and are present remains incomplete, particularly in the absence of true contextual information. Geneticists and environmental scientists will therefore benefit. A further impact of the work described will be derived from the public availability of information from previously uncharacterised organisms and genes (metagenomics is not dependent on an organism to be isolated or in culture before sequencing can occur). These novel organisms and genes will be available for data mining by commercial companies wishing to identify new natural products, such as antibiotics or drugs. Environmental scientists will be able to discover biomarkers for monitoring environmental events. Personal care and pharmaceutical companies can use information from collated metagenomics studies of human samples to identify targets for personal hygiene research or for diseases with a high environmental aetiology, such as coeliac disease. This naturally leads to further benefits for the general public, with society benefitting in multiple ways from a new wave of medical and environmental research brought about by the metagenomic revolution. It is our hope that such benefits will be possible within 5 years. The research assistants working on this project will be exposed to a wide breadth of technologies, data types and users. The software developers will carry out software development using modern software development methodologies, ensuring that the most appropriate technology is used for implementation. They will attain a deeper understanding of the importance of user-driven design in creating software and interfaces. They will also have an appreciation of the complexity of how datasets are related to one another. They are required to have good communications skills so they are easily able to liaise with end-users to discuss requirements and to present progress in the project at international conferences and workshops. If any of their knowledge is lacking, they will be given the opportunity to attend training courses to improve their skills.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file