Award details

CRESTANO - Common REst api for Structural ANnotation

ReferenceBB/K016970/1
Principal Investigator / Supervisor Professor Gerard Kleywegt
Co-Investigators /
Co-Supervisors
Dr Sameer Velankar
Institution EMBL - European Bioinformatics Institute
DepartmentProtein Data Bank in Europe
Funding typeResearch
Value (£) 117,674
StatusCompleted
TypeResearch Grant
Start date 01/09/2013
End date 31/08/2014
Duration12 months

Abstract

PDBe (Protein Data Bank in Europe) has developed many unique and advanced tools and services such as PDBePISA, for prediction of biomacromolecular assemblies and analysis of interfaces, PDBeMotif, for access to structural ligand-binding information and 3D structural motifs, and SIFTS, for up-to-date cross-references to UniProt, CATH, SCOP, Pfam, InterPro, PubMed, NCBI taxonomy, GO, and IntEnz for all PDB entries. PDBe has also established resources specific for X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy and cryo-Electron Microscopy (EM). PDBe is currently implementing the wwPDB validation pipelines for X-ray, NMR and EM data deposited to the PDB and EMDB archives. The proposed PDBe e-infrastructure will create a unique and unified web service API for accessing data and annotations from PDBe databases and its advanced services and tools. In this project, we propose to develop a REpresentational State Transfer (REST) web-service API to provide access to the following classes of PDBe and CCDC data and information in an integrated framework: 1. PDB data from the PDBe database infrastructure 2. Advanced analysis and annotations on biomacromolecular assemblies from PDBePISA 3. Ligand-environment data from PDBeMotif 4. 3D structural motifs from PDBeMotif 5. Up-to-date cross-references to UniProt, CATH, SCOP, Pfam, InterPro, PubMed, NCBI taxonomy, GO and IntEnz for all PDB entries, taken from the SIFTS resource developed by PDBe and UniProt 6. Data-quality indicators for all PDB entries and representative structures using the wwPDB validation-pipeline data available at PDBe 7. Access to CSD data for molecules that are also in the PDB

Summary

The Protein Data Bank in Europe (PDBe; pdbe.org) is one of the core resources at the European Bioinformatics Institute (EMBL-EBI). PDBe is a founding member of the Worldwide Protein Data Bank (wwPDB), which manages the PDB, the single global archive of biomacromolecular structure data. The other wwPDB partners are RCSB, PDBj and BMRB. PDBe has operated a deposition and annotation facility for PDB data since 1998. Over the years, PDBe has developed advanced tools and services for analysis of biomacromolecules (including unique tools such as PDBeFold, PDBePISA and PDBeMotif) and for delivery of PDB data to the user community. In addition, PDBe develops and maintains critical resources such as SIFTS (Structure Integration with Function, Taxonomy and Sequences), a vital source of up-to-date cross-reference information to other biological data resources. The Cambridge Crystallographic Data Centre (CCDC; www.ccdc.cam.ac.uk) manages the Cambridge Structural Database (CSD), the main archive for small-molecule crystal structure data. CSD contains structural data for organic and organometallic compounds obtained using single crystal X-ray and neutron diffraction methods or based on powder diffraction data. The archive was established in 1965 and now contains more than 600,000 structures of small molecules. In addition to archiving the small molecule structural data, CCDC has developed many tools for the analysis of these data. The information available in the PDB archive is used by structural biologists and the wider biomedical community to understand the structures archived in the PDB, while CSD data amongst many other applications can be used by chemists and biochemists for automatic screening of natural molecules suitable as drug candidates. Recently, to improve the annotation and validation of small molecule information in the PDB, wwPDB has entered into a collaboration with CCDC. As part of this, CCDC has made a number of tools available to the wwPDB partners, including Mogul, which will be used for validation of small molecule geometry during deposition and annotation of PDB data. This will constitute a major improvement as analysis of ligand structures in the PDB has shown that the majority of ligand models can be improved. The structure validation pipeline, which includes Mogul and which will become a critical part of the new wwPDB Deposition and Annotation system (D&A), is being developed at PDBe. The goal of the present project is to implement a web-services API that will provide access to biomacromolecular structure data and advanced analyses and annotations of those structures available from PDBe. Additionally, CCDC will develop infrastructure to allow access to small-molecule data in the Cambridge Structural Database (CSD) for those compounds that are present in the both the CSD and the PDB. This will facilitate real-time programmatic access to up-to-date information from PDBe databases and advanced tools and services, which will become available to any bioinformatics and structural-biology-workflow systems as well as individual programs. In addition, access to experimentally determined structures from the CSD will provide better quality starting models for ligands during the macromolecular structure determination process. This, in turn, will improve the quality of deposited ligand data in the PDB, benefitting chemoinformatics research and informing the structure-based design of new drugs. In this project, we propose to develop a method to provide access to types of PDBe and CCDC data and information in an integrated framework: 1. PDB data from the PDBe database infrastructure 2. Advanced analysis and annotations on biomacromolecular assemblies 3. Ligand environment and 3D structural motifs data from PDBeMotif 4. Up-to-date cross-references for all PDB entries, taken from the SIFTS resource 5. Data-quality indicators for all PDB entries 6. Access to CSD data for molecules that are also in the PDB

Impact Summary

PDBe (Protein Data Bank in Europe) has developed many unique and advanced tools and services such as PDBePISA, for prediction of biomacromolecular assemblies and analysis of interfaces, PDBeMotif, for access to structural ligand-binding information and 3D structural motifs, and SIFTS, for up-to-date cross-references to UniProt, CATH, SCOP, Pfam, InterPro, PubMed, NCBI taxonomy, GO, and IntEnz for all PDB entries. PDBe has also established resources specific for X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy and cryo-Electron Microscopy (EM). PDBe is currently implementing the wwPDB validation pipelines for X-ray, NMR and EM data deposited to the PDB and EMDB archives. The proposed PDBe e-infrastructure will create a unique and unified web service API for accessing data and annotations from PDBe databases and its advanced services and tools. The new infrastructure will allow integration into the PDB of small molecule data from the Cambridge Structural Database (CSD) for all small molecules that are found in the PDB in complex with biomacromolecules. Comparing the structures of small molecules in isolation (CSD data) and bound to their biomacromolecular targets (PDB data) will improve our understanding of whether binding results in geometric strain, which in turn may help elucidate the mode of substrates and signaling molecules. Alternatively, it may aid design of improved inhibitors or antagonists with reduced strain and possibly tighter binding. Finally, such comparisons can help users to assess if any unusual geometry or conformation is likely to be of biological significance or more likely to be an artifact of the structure-determination protocol. Structures of small molecules in the CSD are almost always of high quality and represent strain-free conformations due to the higher resolution and better observation-to-parameter ratios obtained with small-molecule crystals. Thus, the CSD is potentially a very valuable source of high-quality starting models for structural biologists, and more frequent use of these structures would result in better quality ligand data in the PDB. Currently, it is possible to freely request individual CSD structures if the user knows the CSD identification code, but there is no mechanism that allows external structure-based queries of CSD. The proposed e-infrastructure at CSD will allow wwPDB annotators to query the CSD for structures that are identical or very similar to newly deposited ligands in the PDB. In this way, representative coordinates from CSD can be incorporated into the wwPDB chemical component dictionary and distributed publicly The web services will enable identification of high-quality starting models for use in structure building and refinement. The availability of structure-quality information will benefit developers of model-building and refinement software by identifying the most suitable starting models for ligands and biomacromolecules to use in the structure-determination process, be it by X-ray, NMR or EM. Programmatic access to annotations could also inform, for example, the interpretation of unexplained electron-density features in the active site of a protein, by providing information about all possible ligands found in a given protein environment. The e-infrastructure will allow PDBe to bring together relevant data (e.g. data related to a particular ligand or protein molecule) from distinct PDBe advanced tools and resources and integrate it to provide users with a single user interface showing all the available information.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file