Award details

Developing new methods to enable amino acid co-evolution algorithms to be applied to protein-protein interaction prediction

ReferenceBB/L018330/1
Principal Investigator / Supervisor Professor David Jones
Co-Investigators /
Co-Supervisors
Institution University College London
DepartmentComputer Science
Funding typeResearch
Value (£) 138,594
StatusCompleted
TypeResearch Grant
Start date 31/05/2014
End date 30/11/2015
Duration18 months

Abstract

We propose to develop new algorithms to extend our recent highly successful work (PSICOV) on identifying co-evolving sites in large multiple sequence alignments to the problem of predicting interacting sites in separate proteins. This will allow us to produce a Web-bsed tool which can predict novel protein-protein interactions, and also to generate 3-D models of any putative complexes where templates can be found for the individual subunits or domains. The main challenge addressed in the proposal is the problem of home to compute amino acid covariation between two separate alignments, which is the main bottleneck in successfully applying covariance methods to predicting protein-protein interactions. Although in theory there is absolutely no obstacle to extending covariance methods to separate alignments, the practical obstacles are substantial. The only way to get sufficiently accurate covariance data is to ensure that there is accurate species and orthology equivalence between each row in the two alignments. In other words the phylogenetic origin of Sequence N in Alignment 1 must be identical to that of sequence N in Alignment 2 and so on for all sequences. Unfortunately there are always differing numbers of homologues in the alignments due to incomplete genomes, difficulty in assigning orthologs and so forth. To solve this we propose a two stage process. Firstly labelling pairs of sequences between the two alignments where equivalence can be decided from data bank annotations. Then extending this core alignment by maximising the overal mutual information score between the two alignments to maximise the likelihood of observing covarying sites between the two proteins. Once the core algorithms have been implemented, a Web-based tool will be released to allow users to construct large accurate paired alignments, and to use these alignments to predict interaction maps between proteins and to carrying out contact-constrained rigid-body protein-protein docking.

Summary

Proteins are molecules present in every cell that carry out essential biological processes. These molecules are essentially strings of simpler chemicals, called amino acids and these strings are able to self-assemble into a unique 3-D structure as soon as the protein is made by the cell's protein-making machinery (called ribosomes). It's this unique structure that determines the function of the protein (i.e. what is does in the cell and how it does it). By shining X-rays on crystallised proteins, scientists can determine their structure by looking at how the rays reflect off the layers of atoms that make up the crystal. However, this process can take many months or even years of effort. With hundreds of thousands of proteins for which the native structure is unknown, it is not surprising that scientists are keen to find clever shortcuts to working out the structure of proteins. We, like many other scientists have been trying to decipher the so-called protein folding "code" i.e. trying to work out the rules which govern how the protein finds its unique structure and then trying to program a computer with these rules to allow scientists to quickly "predict" what the structure of their protein of interest might be. Although the shape or "fold" of a single protein is an important piece of information, it is arguably even more useful to determine which proteins interact with a given protein of interest, and the geometry these so-called protein complexes i.e. groups of proteins which have evolved to stick together in a very specific way. Good examples of such complexes are found in many areas of biology and medicine. For example, a number of different protein complexes play a crucial role in controlling how blood clots. In general, protein-protein complexes underlie our whole understanding of how cells and organisms operate as "systems" - which is a field known as "systems biology". Unfortunately, experimentally studying the structure of a protein complex is even more difficult than studying the structure of a single protein, and so scientists have an urgent need for better computational tools to allow them to predict which proteins could interact and the likely overall shape of the complex that they form. In this project, we propose to exploit some recent breakthroughs in understanding how protein sequences evolve to allow us to deduce which pairs of proteins might interaction and the structures of the complexes that they form. In a nutshell we look for pairs of residues that appear to change in synchrony when we look at the different versions of the proteins found in different organisms i.e. we look for cases where a change in one amino acid always seem to occur when we see another amino acid changing. These linked changes are called "correlated mutations" and when we find them, we can be reasonably sure that the two amino acids have evolved to be close together in 3-D space in the final folded form of the protein. If we find enough correlated mutations, we can even go as far as predicting the complete structure of the protein and we hope as far as predicting the structure of a protein-protein complex in a similar way.

Impact Summary

SUMMARY OF PROJECT This proposal is to build a web-based tool to allow bioscientists to merge multiple sequence alignments for different families and from this data predict novel protein-protein interactions, and to dock proteins together. COMMUNITY IMPACT Predicting protein-protein interactions is a key component in understanding how biological systems work at a molecular level. Every biological network or reaction pathway involves interactions between proteins, and being able to determine which proteins in a system interact, and to be able to intervene in these interactions could have wide implications in a variety of BBSRC areas involving systems biology in the broadest sense. A few examples are as follows: Food security - Increasingly the sequences of plants, agricultural pests and agents of disease are the focus of genome sequencing and structural studies. Interactions between plant proteins and pathogen proteins are key to many aspects of this research area. As our methodology only requires sequence data, this should allow novel leveraging of high throughput sequence data in the food security theme area. Bio-energy and bio-industry - The manipulation of individual molecules and pathways will yield new sources of energy and materials. Synthetic pathways can be engineered to make molecules, such as fuels, more efficiently. In addition, novel molecules can be designed and synthesised. Advance knowledge of structural information relating to protein-protein complexes can be used to suggest the critical changes needed to alter function. Health - The central role of protein structure in the design of novel and improved pharmaceuticals is well established. Almost every conceivable drug-protein interaction involves protein complexes, rather than individual protein chains. The focus of this project on building novel tools in this area will thus be especially beneficial. POLICY MAKERS AND THE LAY PUBLIC This project can serve as an excellent example to policy makers and the lay public about of the high impact that computational biology projects can achieve relative to the low project costs. For example, by looking at citation data it is easy to show how many different experimental projects, in a wide variety of areas, critically depend on the availability of computational tools similar to the ones outlined in this proposal. This project could also help underline the importance of the internet and "Big Data" in future government policy making.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file