Award details

CIBR 19-BBSRC-NSF/BIO: Next generation PDB - FACT infrastructure with value added FAIR data supporting diverse research and education user communities

ReferenceBB/V004247/1
Principal Investigator / Supervisor Dr Sameer Velankar
Co-Investigators /
Co-Supervisors
Institution EMBL - European Bioinformatics Institute
DepartmentMSCB Macromolec, structural and chem bio
Funding typeResearch
Value (£) 378,905
StatusCurrent
TypeResearch Grant
Start date 15/01/2021
End date 14/01/2024
Duration36 months

Abstract

This project aims to improve data deposition, delivery, and management of three-dimensional (3D) macromolecular structure information stored in the single global public data resource known as the Protein Data Bank (PDB). The PDB currently houses ~160,000 experimentally determined 3D structures of proteins and nucleic acids. It is managed according to the FAIR Principles on an open access basis by the Worldwide Protein Data Bank (wwPDB; wwpdb.org) partnership. The project addresses significant software engineering challenges, resulting from (i) the relentless growth in the number and size/complexity of newly deposited structures, and (ii) the need to manage incoming data as groups of related structures (or investigations). The project will improve the fidelity and completeness of 3D structure data deposited into the PDB by harvesting data automatically from structure determination software packages, and streamlining the wwPDB data deposition, validation, and biocuration system known as OneDep. The project will improve the "FAIR"ness of PDB data for researchers, educators, and students by extending chemical metadata for small-molecule ligands (e.g. bound cofactors and inhibitors), incorporating enhanced descriptions of macromolecular assemblies, grouping related PDB structures into investigations for more efficient, parallel data delivery; and creating a "Next Generation" PDB data repository with up-to-date metadata. Finally, the project will modernise wwPDB information technology infrastructure to future-proof PDB data management and weekly PDB archive release to the public domain by developing new application programming interfaces (APIs) and microservices infrastructure, and updating existing mechanisms for synchronisation of data and software across wwPDB data centres in the US< Europe, and Asia. This work will directly benefit researchers, educators, and their students across the natural, physical, and engineering sciences.

Summary

The vision of this US RCSB Protein Data Bank/Protein Data Bank in Europe collaborative project is to improve data deposition, delivery, and management of three-dimensional (3D) macromolecular structure information stored in the Protein Data Bank. This work will benefit researchers, educators, and their students across the natural, physical, and engineering sciences. "Form (meaning shape/3D structure) dictates function in biology" - was first revealed in the Watson and Crick publication of the DNA double helix structure. Since their landmark discovery, interdisciplinary collaborative teams of biologists, physicists, chemists, and engineers have generated ~160,000 experimentally determined 3D structures of proteins and nucleic acids, which are centrally stored in a public data resource known as the Protein Data Bank (PDB). Founded in 1971 as the first open-access digital data resource in biology, the PDB has grown more than 20,000-fold to become the single global archive housing richly annotated 3D structures of proteins and DNA and RNA. This public-domain 3D structure data resource has had an enormous impact on fundamental biology, biomedicine, biotechnology, and bioenergy by enabling atomic-level understanding of naturally-occurring and engineered biomolecule, and by facilitating discovery of nearly 90% of the new drugs approved by the United States (US) Food and Drug Administration between 2010-2016. Today, new PDB structures are coming from macromolecular crystallography (MX), nuclear magnetic resonance spectroscopy (NMR), single-particle cryo-electron microscopy (3DEM), and micro-crystal electron diffraction (microED). X-ray free electron lasers and new integrative methods for structure determination are accelerating biomedical research with insights into ever more complex biological systems at the atomic level. Cryo-electron tomography even allows studies of macromolecular machines "caught in the act" inside frozen cells. Since 2003, the Worldwide Protein DataBank (wwPDB, wwpdb.org) partnership has managed the PDB Core Archive (hereafter PDB archive) as a global Public Good according to the FACT principles of Fairness-Accuracy-Confidentiality-Transparency and the FAIR principles of Findability-Accessibility-Interoperability-Reusability. The wwPDB includes locally-funded partners in the US (Research Collaboratory for Structural Bioinformatics Protein Data Bank, RCSB PDB), Europe (Protein Data Bank in Europe, PDBe) and Asia (Protein Data Bank Japan, PDBj), plus a specialist NMR resource (BioMagResBank, BMRB). The wwPDB also enables equitable sharing of PDB data archiving and management costs between US, Europe, and Asia. In 2019, RCSB PDB, PDBe, and PDBj jointly processed 13,377 new structures coming into the PDB archive using the web-based, global wwPDB OneDep software system for deposition, validation, and biocuration. Also, in 2019, RCSB PDB, PDBe, and PDBj jointly enabled download of ~800 million PDB structure data files by millions of users from around the world. Today, wwPDB partners are confronting significant software engineering challenges, resulting from (i) the relentless growth in the number and size/complexity of newly deposited MX and 3DEM structures, and (ii) the need to manage incoming data as groups of related structures (or investigations) coming from serial femtosecond X-ray crystallography (SFX) using X-ray Free Electron Lasers (XFEL) and 3DEM.
Committee Not funded via Committee
Research TopicsStructural Biology
Research PriorityX – Research Priority information not available
Research Initiative UK BBSRC-US NSF/BIO (NSFBIO) [2014]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file