Award details

FunPDBe - Community driven enrichment of PDB data with structural and functional annotations

ReferenceBB/P023959/1
Principal Investigator / Supervisor Professor Michael Sternberg
Co-Investigators /
Co-Supervisors
Institution Imperial College London
DepartmentLife Sciences
Funding typeResearch
Value (£) 123,443
StatusCompleted
TypeResearch Grant
Start date 23/09/2019
End date 20/08/2022
Duration35 months

Abstract

Macromolecular structure data provides valuable information for the wider biomedical user community as demonstrated by Nobel prizes awarded to 22 scientists between 1946 and 2016 for studies related to the field of structural biology. To achieve even greater impact the coordinate information available in the Protein Data Bank (PDB) has to be supplemented by information providing biological context and enriched by value-added annotations. The challenges in deriving the biological context from the limited annotations available in the PDB has led to the development of many specialist data resources and structure analysis tools that enrich annotations. When combined with the coordinate data from PDB, these provide mechanistic information on biological processes. The structural bioinformatics community in the UK has been at the forefront of implementing tools and developing data resources to enrich structural data. The FunPDBe project will establish an integrated and easily accessible resource of structural and functional annotations for data available in the PDB. The collaboration between the Protein Data Bank in Europe (PDBe) and world-leading structural bioinformatics data resources will promote interoperability, comparative analysis and exchange of structural and functional annotations through the implementation of common data standards and infrastructure and bringing together currently fragmented enhanced annotations in a central repository. The project will implement a uniform data access mechanism and re-usable web components for distribution and display of these functional and structural annotations. The easy access to structural data and enhanced annotations will support obtaining insights into the effects of genetic variations, development of new tools to aid synthetic biology, enhancement in valuable annotations to enrich information available for agriculturally important macromolecules and contribute to human health by aiding interpretation of nsSNPS.

Summary

Rapid technological and scientific advances in the field of life sciences have resulted in exponential increase in the amount and diversity of biological data. This has revolutionised life-science research and transformed it into a data driven scientific field. This transformation is also affecting our understanding of three-dimensional structures of molecules of life such as proteins. Structures of macromolecules can provide great insights into the functional mechanism of biological processes. These structural data are archived in the Protein Data Bank (PDB), one of the oldest data archives in the biomedical field. PDB was established in 1971 and now contains more than 120,000 structures of macromolecules. PDB is managed by a worldwide collaboration, wwPDB, of which Protein Data Bank in Europe (PDBe) is a founding member. The wwPDB partners accept new macromolecular structures determined by scientists across the world and standardise the way these are distributed by carrying out annotation of these newly deposited entries. This annotation is limited as far as the biological context of the macromolecule is concerned. Integrating these data with other biological information and predicted annotations can help improve our understanding of life and disease processes, help design new drug molecules, or understand the effects of genetic variation on health and disease. Combining the macromolecular structure data in the PDB with value-added annotations that provide biological context can accelerate the use of this information in improving industrial biotechnology, agricultural products and human health. The UK has world leading structural bioinformatics community that has over the years developed many data analysis tools and data resources to add biological context and value added annotations to macromolecular structure data available in the PDB. Although these resources are well used, their usage can be further improved if the issues of fragmentation of information and lack of standards for describing the annotation information are addressed. FunPDBe, is designed to address these issues by standardising the way functional annotations can be shared and by subsequently implementing a central data resource that brings together the data from the PDB with the annotations from the leading UK-based structural bioinformatics data resources.. The infrastructure developed during the project will allow integration of annotations from other data resources from around the globe, not initially involved in this project. The project will also provide uniform access to this enriched data. FunPDBe will improve sustainability, ensuring that the annotations are archived safely, are accessible for the foreseeable future, remove duplication of effort and thus protect the work and investment that has gone into developing the specialised participating data resources. Thus, FunPDBe will become a unique, open global resource, and help secure the UK's leading role in structural computational biology into the future. Our goal will be achieved through following specific activities - 1. A series of workshops to establish an open forum for the UK structural bioinformatics community 2. Identify and import structural and functional annotations in the FunPDBe resource using standards defined for the different data types. 3. Analyse and validate annotations provided by different prediction algorithms 4. Define and implement protocols and mechanisms for data collection and integration 5. Provide uniform data access mechanisms using the data standards; identify and create useful representative datasets to support researcher community 6. Develop training materials and deliver training workshops

Impact Summary

FunPDBe is likely to have an impact over a very wide range of applications in the bioscience and biomedical areas. The key aspect of FunPDBe is the enrichment of value to that already in the PDB in terms of function annotations and the description of the probable structural effects of sequence variants. Currently there are over 500 million downloads of the PDB and over 500K distinct users of PDBe. We therefore expect that there is already a large user-base who will benefit from FunPDBe. There will be three routes by which this impact will be realised. The first is through direct use of the resources by the non-academic sector. The pharmaceutical sector makes extensive use of the PDB data in structure-based drug discovery, diagnostics and similar work. These industries usually have home-built pipelines for target identification and for analysis of large or small molecules that can potentially bind these targets, etc. It is anticipated that rich functional annotations (e.g., identifications of binding sites, effects of mutations) and predictions and the availability of a uniform data access mechanism will make data discovery easier and can lead to more efficient analysis pipelines. The structural and functional information will also facilitate the design of modified proteins with specific properties such as altered substrate specificity and enhanced enzyme efficiency, in the emerging area of synthetic biology. With the rapid decrease in the cost of genome sequencing, vast information about genetic variation in humans and many other species is being obtained. FunPDBe will provide annotations that will assist in interpreting the effect of these variants, for example identifying mutations which are likely to disrupt the tertiary or the quaternary structure or disrupt protein function and hence be associated with human or animal disease. In particular, Genomic England is undertaking sequencing of 100K individuals to identify disease-associated variants and data from FunPDBe will be of enormous value in analyses of these data. There are more than 20 consortia of biomedical researchers, established as part of the Genomics England activity, researching a range of different cancers and rare diseases, who will therefore benefit from the integrated data in FunPDBe. The second route to derive impact is via the integration of this information into other bioinformatics resources that are used by the sectors described. We will work with resources such as UniProt, InterPro and Ensembl to facilitate integration of enriched annotations in those resources. The third major route for realising impact is via the increasing number of academic groups that make use of PDB information and will have access to the enhanced annotations in FunPDBe. Their research impacts across all areas of commercial and societal advancements. Thus, via the academic and industrial pathways, the FunPDBe project will contribute to advances in human health, food security, animal health and related areas. The availability of functional and structural impacts data from more than 10 UK groups from a single site, FunPDBe, will be very beneficial in ensuring that these data are easily accessed and contrasted. This in turn will ensure that the data has a much more significant impact. Members of the wider society often find structural biology too specialised a field. The key aspect of FunPDBe is to place the individual results of structural biology studies in a wider biological context to help an interested individual to more readily appreciate the importance of the field. For example, FunPDBe will have information on the effects of mutations, some of which may lead to disease. Being able to more easily create a coherent story from health and disease to an effect mutations have on structures will be a useful tool in outreach to the public, for instance via science festivals targeting school aged children, their teachers and parents.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file