Award details

Supporting archival and dissemination of small-angle scattering data for atomistic structures in the PDB

ReferenceBB/M020347/1
Principal Investigator / Supervisor Professor Gerard Kleywegt
Co-Investigators /
Co-Supervisors
Dr Aleksandras Gutmanas
Institution EMBL - European Bioinformatics Institute
DepartmentProtein Data Bank in Europe
Funding typeResearch
Value (£) 147,997
StatusCompleted
TypeResearch Grant
Start date 01/10/2015
End date 31/03/2017
Duration18 months

Abstract

The Protein Data Bank (PDB) is the single global repository of three-dimensional (3D) structures of proteins, nucleic acids and their complexes. The PDB is managed by the Worldwide Protein Data Bank (wwPDB), an international consortium of four organisations, including the Protein Data Bank in Europe (PDBe) at EMBL-EBI in Cambridge. wwPDB is implementing a new deposition and annotation (D&A) system, to facilitate the deposition, curation and distribution of structures and experimental data resulting from X-ray crystallography, Nuclear Magnetic Resonance (NMR), and electron cryo-microscopy. wwPDB has convened various task forces (TFs), made up of community experts, to advise it on archival policy, validation, etc. The TF for small-angle scattering (SAS) strongly recommended that wwPDB should collect and disseminate experimental SAS data, if it was used in conjunction with or in support of other methods, such as X-ray or NMR. We aim to implement this community recommendation through 3 specific objectives: (1) Extend the D&A tool to allow deposition of experimental SAS data to the PDB. We will consult and work with SAS community experts, our wwPDB partners and other stakeholders and will update the sasCIF and mmCIF dictionaries to fully accommodate the SAS data. We will design, code, test and release the extended deposition software. (2) Make the deposited SAS data publicly available. With our wwPDB partners, we will implement the format and mechanism through which this data will be publicly released in the PDB archive. (3) Extend the PDBe website to present SAS data to both specialist and non-specialist users. We will load the SAS data into the PDBe Oracle database and make it available via our API. We will design, code and test web pages and tools to present the SAS data and derived parameters relating to the overall shape and size of the studied system and goodness of fit between the deposited model and the data.

Summary

This proposal is concerned with preserving research results and experimental data in the field of structural biology, which aims to determine three-dimensional (3D) structures of important biological molecules, such as proteins and nucleic acids. Knowledge of these structures can aid areas such as discovery of new drugs, development of diagnostics of diseases, understanding the biology of health, disease and ageing, and optimisation of industrial processes through engineering of better enzymes. Structural data is deposited into a single global archive (the Protein Data Bank or PDB) by researchers from academic, government and industrial laboratories from all continents except Antarctica. This data is then annotated and made publicly and freely available. After starting with only 7 protein structures in 1971, the PDB experienced an almost exponential growth and in May 2014 passed the 100,000-structure milestone. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB) consortium, with partners in the UK, USA and Japan. X-ray crystallography and Nuclear Magnetic Resonance (NMR) are the main two experimental methods in structural biology, and they have contributed 99% of the structures in the PDB. There is wide consensus in the structural biology community that the archived structures of biological molecules should be accompanied by the experimental data that supports them. Not only does this enable other scientists to reproduce, verify or reinterpret the original findings, it also stimulates and facilitates the development of new methods for structure determination and validation. Since 2008, deposition of experimental data is mandatory for all structures determined by X-ray and NMR methods. However, other techniques, usually in combination with X-ray or NMR, are increasingly used nowadays to investigate the structure of large, complex and thus challenging biological systems. A popular technique is electron microscopy (EM), and experimental EMdata can be deposited in a separate archive (called EMDB). Another, increasingly popular technique is small-angle X-ray (or neutron) scattering (SAXS/SANS). It can provide information on the overall size and shape of the studied molecules. SANS experiments in addition can provide information on the relative positions of various molecules when they interact to form large complexes. However, the wwPDB does not currently have a mechanism to collect the underlying experimental SAXS/SANS data. The wwPDB partners have therefore convened a task force of experts in small-angle scattering techniques (SAS TF) to get advice on archiving of SAXS/SANS data. The SAS TF strongly recommended that this data be collected and disseminated by wwPDB if it was used in conjunction with or in support of another technique. This proposal aims to implement this recommendation by developing additions to the wwPDB archival software, and then disseminating the collected SAXS/SANS data to the scientific community. This project will improve our description and understanding of structures of important biological molecules and complexes. The Protein Data Bank in Europe (PDBe) is a founding member of wwPDB. PDBe is part of the European Bioinformatics Institute (EMBL-EBI), the UK-based outstation of the European Molecular Biology Laboratory (EMBL). PDBe has strong expertise in X-ray crystallography, NMR, cryo-electron microscopy and software development. PDBe will pursue the proposed project in close collaboration with its international wwPDB partners and in consultation with world-leading experts in the SAXS/SANS techniques.

Impact Summary

The overarching goal motivating this funding application is to enable the Protein Data Bank (PDB) - the single global, freely and publicly accessible archive of macromolecular structure data - to collect and disseminate experimental data and associated meta-data (e.g., experimental setup and sample information) from small-angle X-ray (or neutron) scattering (SAXS/SANS) techniques, thus making the overall PDB archive more accurate and more complete. For the first time, this will enable the SAXS/SANS data supporting PDB structures to be collected, annotated to a high standard, and archived in a consistent fashion. The goal of this will be achieved by including the curated SAXS/SANS in the public PDB archive, an objective shared by all the wwPDB partners. To facilitate programmatic access to the data, it will be incorporated into the PDBe API, which has been developed as part of the TRDF-funded CRESTANO project (BB/K016970/1). SAXS/SANS data and derived parameters will also be disseminated through the PDBe website. SAXS data can provide information on the overall shape and size of macromolecules as well as their state (e.g., scattering curves obtained for intrinsically disordered proteins have distinct features immediately signalling the presence of disorder). SANS data can provide information about relative positions of macromolecules in larger complexes. The users of the PDB, both academic and from other sectors, will naturally be the ones immediately benefiting from this project. The PDB archive now contains more than 100,000 structures and users worldwide download 30 million PDB entries every month via the wwPDB partner websites and FTP distributions. A significant fraction of this user base is employed in industrial laboratories with an interest in structural biology, in the pharmaceutical, diagnostic, agricultural and other sectors. The entire PDB user base will therefore for the first time have access to the SAXS/SANS data, and will be able to critically evaluate any conclusions drawn from the structures associated with it. Secondary school and university teachers in life sciences will also benefit from more accurate and richer data in the PDB. If the experience with other techniques (e.g., X-ray crystallography) is any guide, then, in a longer perspective, beyond the end of the project, the availability of experimental SAXS/SANS data is expected to lead to new developments in the field, consensus on appropriate validation criteria and ultimately higher confidence in the correctness and reliability of the structures and a realistic appraisal of their limitations.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file