Award details

FUNCLAN - FUNctional annotations through Conformational Landscape Analysis

ReferenceBB/V016113/1
Principal Investigator / Supervisor Dr Sameer Velankar
Co-Investigators /
Co-Supervisors
Institution EMBL - European Bioinformatics Institute
DepartmentMSCB Macromolec, structural and chem bio
Funding typeResearch
Value (£) 402,804
StatusCurrent
TypeResearch Grant
Start date 01/02/2022
End date 31/01/2025
Duration36 months

Abstract

We will develop FUNCLAN, a framework to provide comparative analyses of conformations and associated annotations. This will be achieved through major improvements to the superposition software GESAMT and will deliver a robust process for superposing and clustering of macromolecular assemblies, protein chains and ligand-binding sites across the Protein Data Bank archive. We will perform a comprehensive analysis of the clustered molecular entities, link them to experimental validation information and map annotations of biological and biophysical contexts to them. We will use this enriched and integrated data to refine the superposition and clustering processes, provide a representative structure for each cluster and design metrics that can be used to evaluate clustered assemblies, protein chains or ligand-binding sites. The FUNCLAN framework will support superposing macromolecular assemblies, where the challenge is partly due to the possibility of changes in topologies accompanied by changes in the conformation of individual components. The project will tackle challenges unique to the superposition of ligand-binding sites, such as superposing amino acid residues interacting with the same small molecule versus superposing small molecules bound in different binding sites. The project will provide: 1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites; 2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations; 3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites; 4. Data standards and evaluation metrics for superposed and clustered molecular entities; 5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page

Summary

The dynamic nature of proteins leading to multiple conformational states is critical in many biological processes from forming macromolecular complexes with other proteins, small molecules (ligands) or nucleic acids to switching between active and inactive forms for enzymatic activity. To gain improved mechanistic insights into the function of proteins, structural characterisation of their three-dimensional (3D) structures and their conformational states is critical. Knowledge of the transition between different energetically favoured conformational states is fundamental to the understanding of the principles of protein structure and evolution and can help in explaining the effects of genetic variants, in designing new drug molecules and in elucidating drug resistance at the molecular level. Although the PDB has archived more than 165,000 individual structures, the number of unique proteins based on the number of UniProt accession cross-references grows at a slower pace and totals only ~50,000, with a considerable variation in the redundancy rate amongst different sequences. This is because each protein may have multiple representatives in the PDB: ligand-bound and unbound forms; structures in multiple space groups or sample conditions; in complex with other macromolecules (proteins or nucleic acid) or structures determined of smaller domains or sequence variants. Thus, the structures in the PDB provide a valuable resource for understanding the conformational flexibility of ligand binding sites, individual protein molecules as well as large macromolecular machines. Understanding the similarities and differences in ligand binding sites, individual protein molecules and the large macromolecular complexes using the ensemble of available structures can assist in deciphering the molecular level details of macromolecular function. The availability of data on distinct conformational states will also assist in characterising the particles in whole-cell tomograms, thus allowing molecular phenotyping of whole cells in different disease or development states. In this project we will enhance GESAMT, the structure comparison algorithm, to derive conformational flexibility of ligand binding sites, individual proteins or domains and macromolecular assemblies. The new framework, FUNCLAN, will include the necessary metrics to realise meaningful clustering and the necessary scheme to describe the structural similarities and differences between members of different clusters. Each cluster will have a representative structure and using the structural and functional annotations from PDBe-KB, we will characterise each cluster and provide biological context. The new functionality will be validated against a dataset of known examples from the literature of macromolecules and complexes exhibiting specific conformational states. A pipeline for a PDB archive-wide clustering of ligand binding sites, individual macromolecules and macromolecular complexes will be implemented. The resulting data will be made available programmatically via a REST API, an FTP site, and also via a novel web-based application.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file