Award details

Protein Function Prediction using Machine Learning by an Enhanced Novel Support Vector Logic-based Approach

ReferenceBB/E000940/1
Principal Investigator / Supervisor Professor Michael Sternberg
Co-Investigators /
Co-Supervisors
Professor Stephen Muggleton
Institution Imperial College London
DepartmentLife Sciences
Funding typeResearch
Value (£) 683,503
StatusCompleted
TypeResearch Grant
Start date 01/11/2006
End date 31/10/2009
Duration36 months

Abstract

This proposal has two inter-related aims: 1) to develop a method to predict the function of a protein from its experimental or predicted structure using a novel machine learning method - support vector inductive logic programming (SVILP); 2) to enhance the prototype version of SVILP into a robust tool for use in protein function prediction and in a broad range of other application areas. To develop function prediction, will use the Catalytic Site Atlas and eFsite (electrostatic-surface of Functional site). The first step is to predict which residues are functional. A pool of method will be used: our in-house program PHUNCTIONER that identifies residues specific for function, graph theoretic measures, electrostatics, evolutionary and statistical propensities, spatial clustering and cleft geometry. We will use SVILP to learn rules to predict functional residues using the above as background knowledge. The second step is to learn 3D motifs to specific function using the SVILP to yield rules. We will develop a web server for dissemination of the methodology. We will interact with structural genomics projects to employ and test our method. To improve SVILP, we will consider 4 topics. (1) Feature Selection will select a small number of rules that are highly effective and will be implemented using both filter and embedded methods. (2) Estimation of probabilistic parameters on ILP rules will use maximum a posteriori estimations to give different weights to rules. (3) Novel Kernel Functions will be designed that are efficient and effective for protein function modelling. We will prove the properties of symmetry and positive semi definiteness that will establish the validity of the developed functions as kernel functions. (4) A Multi-class prediction method will implemented that allows SVILP-based techniques to perform robust and accurate multi-class predictors based on schemes which weight the predictive contributions of individual rules and class predictors.

Summary

Proteins are biological molecules that are the machinery of life involved in numerous biological processes such as the breakdown of food to provide energy and the defence of a cell against disease. Proteins adopt complex three-dimensional (3D) structures and the location of the atoms can be revealed experimentally. Knowledge of the 3D structure of a protein and its function often provides major insight into biological processes. In addition, this knowledge is of substantial benefit to the design of novel drugs. As a result of advances in biological research, particularly the sequencing of the genomes of humans, other animals and many bacteria, the scientific community is now determining or predicting the 3D structures for many proteins whose functions are not yet known. In addition computational methods can predict the possible structure of a protein from its chemical formula (its sequence). This project is to develop a computer-based approach to take a protein of experimentally-determined or predicted structure and suggest its function. Protein function is determined by the spatial position of critical residues and the environment of these residues. We will use a computer algorithm to learn the rules from known examples of protein structures and their functions. In particular the machine learning approach will be a combination of logic reasoning and quantitative predictions from a support vector machine using a novel method known as Support Vector Inductive Logic Programming (SVILP). SVILP has the benefits that logic rules are powerful in describing spatial relationships and can be readily understood. However logic rules are yes or no and for quantitative prediction (e.g. confidence or rank) we then feed the logic rules into a support vector machine. In this grant we will enhance this novel SVILP methodology. There will be two major results from the grant. First we will have developed an enhanced method to assign function to protein structure and develop aweb server for use by the community. Second we will have developed an enhanced robust version of SVILP with its power benchmarked on a challenging application and in a form suitable for uptake by the community to apply our method to a wide range of problems.
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Technology Development Initiative (TDI) [2006]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file