Award details

Building a global metagenomics portal ('MGportal') to handle next-generation sequencing data and associated metadata

ReferenceBB/I025840/1
Principal Investigator / Supervisor Professor Susanna Sansone
Co-Investigators /
Co-Supervisors
Dr Dawn Field
Institution University of Oxford
DepartmentOxford e-Research Centre
Funding typeResearch
Value (£) 557,248
StatusCompleted
TypeResearch Grant
Start date 05/12/2011
End date 03/06/2015
Duration42 months

Abstract

While genomes represent the full genetic (DNA) complement of a single organism, metagenomes represent the DNA of an entire community of organisms. Interest in improved sampling of diverse environments (e.g. hosts/gut, plants, soil, etc) combined with advances in the development and application of ultra-high throughput sequence methodologies is set to vastly accelerate the pace at which new metagenomes are generated. Combined with other types of 'omic data, metagenomes hold the promise of unparalleled insights into fundamental questions across a range of fields including evolution, ecology, environment biology, health and medicine. To fully exploit the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. In this three year collaborative project we aim to build an international data resource and portal for metagenomic data at the European Bioinformatics Institute. This portal will manage the submission, storage, dissemination and mining of metagenomic data from data providers across the world. The portal will focus on the capture of rich in contextual information (metadata), working in close collaboration with the Genomic Standards Consortium (GSC) an international working body creating and implementing standards to describe genomes, metagenomes and marker gene sequences. Further, the collaborative use of the ISA Infrastructure software suite for metadata capture will enable capture and sharing of standards compliant data and integration with a range of other data types. The resulting MGPortal will be a major new resource at the EBI. The combined MGPortal Team will engage in a range of community-building activities, including hosting workshops and training activities that both educate data submitters and users and will ensure the portal develops in line with community needs.

Summary

While genomes represent the full genetic (DNA) complement of a single organism, metagenomes represent the DNA of an entire community of organisms. These organisms might be free-living in the environment, or be found on the skin or in the gut of a human being or other species. Microbial organisms play a major role in our everyday health and well-being, which is not surprising when you consider that the number of microbial cells in or on an average human body actually exceeds the number of human cells! Microbes play a similarly important role in the environment; different types of organisms live under different conditions (including extreme habitats, such as the run-off from acid mines or the depths of the oceans). Understanding how these organisms have adapted to their various living conditions will lead to a better understanding of how changes in the environment will have impact on biodiversity in the future. It may also lead to discovery of entirely new species or novel proteins which could have utility as antibiotics or other drugs. Combined with other types of 'omic data, metagenomes hold the promise of unparalleled insights into fundamental questions across a range of fields including evolution, ecology, environment biology, health and medicine. To fully exploit the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. Significant numbers of metagenomics projects have been awarded grants by international funding bodies. Whilst all of these projects have specific, scientifically-interesting aims, they mostly exist in isolation, with little or no cross-referencing to other metagenomic or genomic datasets. Our intention is to leverage existing infrastructure to deliver a world-class metagenomics resource with unique utility for UK-based metagenomics researchers. This resource, MGportal, will utilise user-friendly interfaces, state-of-the-art algorithms and the EBI's unique position as a hub of biological information to measurably enhance the value of these researchers' data. It will be built in close collaboration with the Genomic Standards Consortium (GSC). MGportal will consist of software tools to enable metagenomics researchers to upload their data to the raw nucleotide sequence archives, data analysis pipelines to predict what potential genes are present in the data and what their function is, plus a web interface which will display these data and results in a way that is easy to browse and query. We will hold training courses and a workshop to gain input from the scientific community about the portal. It is hoped that MGportal will eventually allow researchers to understand the results of their metagenomics experiments, as well as seeing how those results compare with the outcomes of other studies.

Impact Summary

The full impact of this work is described in the impact statement of the lead institute, the EBI. Here we elaborate on the specific impact of the work to be completed in this project under the auspices of the Genomic Standards Consortium and the ISA Infrastructure project. The primary impact of the proposed tight collaboration between these groups and the EBI is the increased level of community involvement in the creation of resources that serve community needs. This is a pioneering aspect of this proposal. Community-level consensus: This project will help to continue fund these key grass-roots activities, thus strengthening them and their ability to give a voice to the wider scientific community on issues of data stewardship, standardization and sharing. Specifically, this project will directly fund core activities with the GSC (i.e. through Peter Sterk's role as Secretary of the GSC) and most importantly provide funds to implement GSC recommended standards and the international level. This is a key step on the path towards international adoption of standards that will underpin future data sharing. It will also ensure the usage of a premier example of standards-compliant tools in the creation of this portal. The ISA Infrastructure, already funded by the BBSRC in the past BBR round, is a complete suite of tools for capturing and disseminating standards-compliant metadata. Its use in this project paves the way for universal sharing of metadata about sampled and data types as this work will increase the chances that other projects will adopt this shared aprpoach. Data Sharing. The adoption of these community-defined approaches is also in direct support of the strong BBSRC data sharing policy. Putting this standards-compliant infrastructure into place will ensure compliance with policy of making data freely available in re-useable form. Policy makers. The production of more-richly annotated bioinvestigations will improve the evidence base for policy makers byproviding greater interpretability of experimental context, simplifying the job of data integration and study comparison. More detail for those forming policy on biological and biomedical issues should produce better decisions. Journals. The current trend shows that, like funders, journals increasingly require that firstly, researchers make more of their data public, for example by submitting it to public repositories, and that secondly, they begin to comply with community-defined standards. However 'non-compliance' may be difficult to overcome: experimental metadata are still normally sparse in publications and the supplementary data that sometimes accompany them, limiting data accessibility and utility. This is because of the lack of (i) reviewer time and expertise - they are not trained to check compliance, (ii) awareness of the existence of an appropriate reporting standards, (iii) access to freely available tools implementing standards, and (iv) adequate data management resources at the local and community levels. Greater automation of the reporting processes is required. The only feasible solution is better annotation and education at source (i.e., by providing data producers with a straightforward way in which to use community annotation standards), assisted by some form of automated content validation. Through this collaboration we will disseminate this best practice by building compliance with standards into the MGPortal. Outreach. The high profile nature of this project (a major new database/portal at the EBI) will help to spread the word about the importance of standards in the community. Finally, the planned workshops and interactions with the existing GSC and ISA communities with succeed in engaging a larger proportion of bench scientists in efforts to provide the best possible stewardship of our collective data assets.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file