Award details

Next generation computational tools for the analysis and prediction of protein disorder and related gene function

ReferenceBB/J002925/1
Principal Investigator / Supervisor Professor David Jones
Co-Investigators /
Co-Supervisors
Dr Domenico Cozzetto
Institution University College London
DepartmentComputer Science
Funding typeResearch
Value (£) 324,618
StatusCompleted
TypeResearch Grant
Start date 04/09/2011
End date 03/09/2014
Duration36 months

Abstract

The main aim of this project is to make use of machine learning and simplified molecular simulations of protein disorder-order transitions in the presence of certain ligands (particularly metals, small peptides and DNA) to produce new software tools which can better predict functionally relevant disordered regions and proteins in eukaryotic genomes. The first part of this project will entail the development of an improved predictor for generic binding within disordered regions. It is clear that the largest class of function that is associated with protein disorder related to the binding of ligands, peptides and proteins. It is also clear that improving our ability to distinguish between non-functional regions of disorder and such functional regions will be vital in improving the quality and usefulness of disorder prediction to the general biological community. The main novel aspect here will be the use of disordered domain linkers as control data, along with sequence analysis of evolutionarily conserved disordered regions, thought to be functional modules. The largest part of the project will entail running simulations of disordered protein segments both with and without likely binding ligands present. The ligands we intend to focus on will be metal ions, DNA and small peptides. From statistical analysis of these simulated structural ensembles, we plan to derive statistical models that can predict the likely functional class of the region under study. By integrating all of these results into a single computational tool (available via a Web server and as standalone software), we hope to produce a new generation of protein disorder prediction tool which is able to prediction disordered regions more accurately, but also assign functional significance to these regions and thus provide key functional insight for proteins of which a large fraction are functionally uncharacterised.

Summary

With many genomes now completely sequenced, life scientists face the challenge of characterizing the biological role of the encoded proteins as to advance our understanding of cell physiology. Over the past decades, several experimental studies reinforced the view that the three dimensional structure of a protein is a prerequisite to its function. Evolution optimized the relative positions of specific protein atoms so that they can perform different tasks, including ligand binding and catalysis of reactions. Recently this view has been revised in light of additional data from eukaryotic species. Indeed, several observations prove that a large number of their proteins include highly flexible segments, which assume a fixed conformation when they recognize their biological partners only. These fragments - or whole proteins - are usually called natively unfolded, intrinsically unstructured or disordered and are predominantly found in multi-cellular organisms where they play key roles in signalling and regulatory processes through the binding to proteins, nucleotides, nucleic acids and metal ions. The identification and functional characterization of disordered proteins has drawn increasing attention. Different assays can produce systematic information on the location of disordered regions. However, these techniques suffer from intrinsic limitations and cannot be reasonably applied to all the proteins that a typical eukaryotic organism expresses. On the other side, thorough analyses showed that the amino acid sequences of these proteins are characterized by clear patterns and so computer programs can distinguish them fairly accurately and quickly. The classification of a protein as natively unfolded and the location of its disordered regions are valuable information. Yet, this is not enough to describe in detail what molecular actions the protein performs and what biological processes they relate to. Although we know that some functional categories are particularly enriched in disordered regions, we cannot afford to experimentally test all possible alternatives. A reasonable solution consists in exploiting computers to further analyze these proteins and then in performing much less lab assays to validate the results of computational analyses. This project aims at the development of a web server that will be accessible to everyone through the Internet and that will output functional predictions - i.e. hints - for disordered proteins. The program will first locate disordered regions within the input sequence using a method we previously developed. It will then exploit the chemical and physical features of some ligands - such as DNA and metal ions - to assess the likelihood for the input protein sequence to interact with them through its potential disordered regions. The results are expected to improve our knowledge of this important class of proteins and prioritize experiments aimed at characterizing their functions.

Impact Summary

The immediate beneficiaries of this research are the broad community of bench biologists interested in analysing their proteins of interest for regions of disorder, and from this deriving new insights into the possible function of the proteins. Both academic and industry scientists will benefit in a similar way as the Web services developed as a result of this research will be available freely to all users. Commercial scientists with sensitive data will be able to license the software through UCL Business so that they can exploit the tools without revealing their research interests to other users. Being able to determine even some clue as to the function of the 40% of functionally uncharacterised proteins in model organism genomes can have significant impact in a broad variety of areas e.g. drug, antibody and vaccine design, biochemical engineering, protein design and even nanotechnology. Beyond industrial applications of this research, filling in the major gaps in our knowledge of what the full complement of genes and the products of these genes do and how the proteins interact can have wider implications in understanding the working of healthy cells and how they age. Ultimately this work can make a contribution to our overall understanding of how life processes arise from interactions between a relatively small number of genes in our genomes and the genomes of other organisms.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityTechnology Development for the Biosciences
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file