BBSRC Portfolio Analyser
Award details
UKRI/BBSRC-NSF/BIO: Unifying Pfam protein sequence and ECOD structural classifications with structure models
Reference
BB/X012492/1
Principal Investigator / Supervisor
Dr Alex Bateman
Co-Investigators /
Co-Supervisors
Institution
EMBL - European Bioinformatics Institute
Department
MSCB Macromolec, structural and chem bio
Funding type
Research
Value (£)
723,162
Status
Current
Type
Research Grant
Start date
23/01/2023
End date
22/01/2026
Duration
36 months
Abstract
Evolutionary classification of proteins is essential for all aspects of protein science. Inference of functional properties of an uncharacterized protein from a better studied homolog is a powerful way to generate hypotheses. Protein classifications were separated into two categories: the first, relies on sequence similarity (Pfam); the second, uses 3D structures (ECOD). Sequence classifications are more comprehensive and relevant to protein function, while structure classifications reveal distant evolutionary relationships between protein families. The disparity between them arises from the lack of 3D structures for most proteins with known sequences. However, AlphaFold (AF) removes the barrier between sequence and structure classifications. We will develop cyberinfrastructure to integrate Pfam and ECOD and to classify millions of AF models. We will do this by (i) refactoring the ECOD infrastructure to meet the need of classifying millions of AF models. The revised pipeline will classify domains by sequence, remove disordered or poorly predicted segments, and classify remaining domains by structure comparison augmented with sequence and function evidence, and expert curation. In close collaboration with Pfam, using the newly developed infrastructure we will 1) incorporate all currently released AF models into ECOD, and 2) adapt Pfam families in ECOD. To improve Pfam we will also (ii) develop tools to compare the two classifications and introduce a number of changes to Pfam. We will 1) add new families detected by ECOD, 2) refine domain boundaries using protein structures, and 3) group families into clans by homology identified in ECOD. We will (iii) harmonise Pfam and ECOD. We will resolve inconsistencies between the two classifications, converge to common nomenclature of domains, exchange information, and cross-reference between the two resources.
Summary
N/A
Committee
Not funded via Committee
Research Topics
X – not assigned to a current Research Topic
Research Priority
X – Research Priority information not available
Research Initiative
X - not in an Initiative
Funding Scheme
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list
new search