Award details

Prediction of protein-protein interaction hot spots using a combination of physics and machine learning

ReferenceBB/E017452/1
Principal Investigator / Supervisor Professor David Jones
Co-Investigators /
Co-Supervisors
Dr Stefano Lise, Professor Massimiliano Pontil
Institution University College London
DepartmentComputer Science
Funding typeResearch
Value (£) 310,931
StatusCompleted
TypeResearch Grant
Start date 01/02/2007
End date 31/07/2010
Duration42 months

Abstract

Protein-protein interactions are central to most biological processes, from signal transduction to immune response. Understanding these functional associations requires knowledge of the three-dimensional structure of the complex as this reveal the underlying molecular mechanism. However, determining experimentally the 3D structure of a protein complex present considerable difficulties. There is therefore a need for accurate and reliable computational methods. Several experiments have shown that protein interactions are critically dependent on just a few residues, or hot spots, at the binding interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. In this project we aim to develop a computational method that can identify hot spot residues (and the contacts they form across the interface) in unbound proteins (i.e. without prior knowledge of the complex). This would significantly improve our ability at predicting the overall structure of the complex (the so-called docking problem). We plan to combine and integrate the basic energetic terms that contribute to the stability of protein complexes (e.g. van der Waals potential, hydrogen bonds,etc.) using state of the art machine learning techniques. In the first part of the project, we will develop a method to predict hot-spot residues at protein protein interfaces when the structure of complex is available. In the second part, we plan to systematically dock structural fragments of the two unbound proteins and test them for the presence of potential hot spots (using the classifier developed in the first part). Eventually, we will combine different sources of information (energetic, evolutionary and structural) to predict few important contacts across the interface of two proteins.

Summary

Over the last few years, genome sequencing projects have provided the nearly complete list of genes and proteins present in a cell. The challenge is now to understand how these molecular components interact to give rise to complex and highly interrelated biological processes and phenomena. The long term goal is to reach a quantitative and predictive description of a biological system as a whole (e.g. a cell) grounded in molecular-level knowledge. This would offer an an opportunity to study how the phenotype is generated from the genotype, for example in relation to genetic diseases. In this project we plan to investigate protein-protein interactions. Protein-protein interactions are fundamental to all biological processes, from signal transduction to gene regulation, from catalytic reactions to immune response, and more. In order to bridge the molecular to the system level, it is therefore essential a detailed knowledge of which proteins interact and how they interact. A full understanding of the functional relationship between proteins comes only from the three-dimensional (3D structure) of the complex as this reveal the underlying molecular mechanism. However, determining experimentally the 3D structure of a protein complex present considerable difficulties. There is therefore a need for accurate and reliable computational approaches that can tackle the so-called docking problem, i.e the prediction of the complex conformation starting from the structures of its component proteins. Most docking procedures consider the full 3D structure of the complex and try to orient the individual proteins so as to optimize their shape and chemical complementarity. We propose instead to develop a computational method to predict which amino-acids are in contact at a protein-protein interface. Several experiments have shown that protein interactions are critically dependent on just a few amino acids, or hot spots, at the binding interface. If potential hot-spots could be identified in isolated proteins, our ability at solving the docking problem would be significantly enhanced. We plan to combine and integrate the basic energetic determinants of hot-spot interactions (e.g. Van der Waals potentials, hydrogen bonds,etc.) using state of the art machine learning techniques (e.g. neural networks and support vector machines). Such an hybrid scheme is necessary because the problem is too complex and can not be solved purely from first principles: simplifications and approximations need to be introduced. Machine learning algorithms are extremely powerful in learning from known examples and in generating empirical rules. They can therefore be used to complement and guide physical methods and extend the limits of their applicability.
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file