Award details

Rigorous Information-theoretic tools for Comparative Interactomics.

ReferenceBB/H018409/1
Principal Investigator / Supervisor Professor Franca Fraternali
Co-Investigators /
Co-Supervisors
Professor Anthony C.C. (Ton) Coolen
Institution King's College London
DepartmentRandall Div of Cell and Molecular Biophy
Funding typeResearch
Value (£) 283,356
StatusCompleted
TypeResearch Grant
Start date 07/07/2010
End date 24/12/2013
Duration42 months

Abstract

An important question in systems biology is how to quantify optimally and systematically the macroscopic structure of protein-protein interaction networks (PPIN). Its answer will allow us to study more systematically the relation between topology and biological functionality in such signalling networks. We have previously developed a novel information-theoretic framework for the generation of such measures, and proven their applicability to PPIN. The results showed for the first time that PPI datasets from the same species that exhibited only a small fraction of common interactions (and were therefore viewed with suspicion by the experimental community) in fact share remarkably similar macroscopic topological properties. This study, however, is limited in that the topological information on which it is based does not include network loops, in spite of their known relevance in proteomic modules. In this project we seek to develop more informative information-theoretic distance measures for comparing PPIN networks, to serve as powerful practical tools for characterizing and comparing networks within and between species and for mapping methodological biases. These new measures will additionally take into account the statistics of short loops in PPIN. At a mathematical level this changes the problem from calculating the Shannon entropy for ensembles of effectively tree-like graphs to calculating the Shannon entropy for ensembles of graphs with constrained short loops (either exactly of in controlled approximation), which is HIGHLY nontrivial and requires qualitatively different techniques. Once developed the theoretical framework, this will be applied to the continuously growing PPIN for different species available in the public domain, the analysis should highlight more subtle differences between the graphs with respect to conserved functional modules. Our results should inform and guide new experiments aimed at reducing the existing biases.

Summary

Molecular signals in the living cell can be in a first approximation mostly attributed to Protein-Protein Interactions (PPI) and their complex cross-talk. Since the recent completion of the Human Genome project, it has now become possible to identify and map a large part of the proteins encoded in our genes. However, more details about the molecular interactions involved in signal transduction pathways need to be uncovered before we can truly understand the complex biology of our cellular system. This represents one of the major challenges for the next years of research in biology and medicine. Molecules signal information through interaction with specific binding partners. The binding induces a conformational change in at least one of the partner molecules, which triggers the next biomolecular step in the signaling cascade. The mechanisms of interaction between proteins are therefore crucial to all biological functions, and the effectiveness of this cross-talk during signal transduction plays a fundamental role in many 'healthy' biological processes and in many diseases (e.g. cancers). Several large-scale experimental studies have been published in recent years, to detect PPIs for diverse species, and have been deposited in publicly available databases. Current experimental techniques, such as yeast two-hybrid (Y2H) and co-affinity purification combined with mass spectrometry (AP-MS), have, however, been shown to samplesubsets of the interaction data space with only very limited overlap. We have recently developed a theoretically sound and accurate mathematical framework for comparing interactome data (PPI networks, PPIN) and to evaluate, in an unbiased way, their distance in terms of macroscopic topological properties. Preliminary analysis revealed that networks of the same species and sampled by the same method are similar, and more similar than networks sampled by the same method but different species. Therefore, networks generated from similar experimental conditions have similar topological features, despite their small overlap of the individual PPIs. To our knowledge this has not yet been shown so clearly and in such an unbiased way. Moreover, we could see very clearly, upon comparing networks sampled with different methods,that the data bias induced by the sampling method presently overshadows species related structural properties. Again, although methodological biases have been acknowledged in the literature, our ability to quantify their impact by using objective distance measures opens a powerful new window on proteome data and their quality control. In this project we seek to add a further essential ingredient to the theory: to include in our macroscopic characterizations of networks the statistics of short loops (beyond quantifying structure only via degree statistics and degree correlations, on which the earlier work was based). The rationale is that functional modules involving a small number of nodes (typically 3-6) appear to play an important role in the overall transduction mechanism. To derive formulae that improve upon those we have used in the previous PPIN comparison, we now need to calculate analytically the Shannon entropy for random graphs with constrained loops. This step is theoretically very difficult, and will involve half of the project duration. If fully exact evaluation is too demanding, we will resort to well-defined and sensible approximations of the loop statistics instead. Numerical simulations will be performed on suitably constructed families of synthetic networks, generated identically or close to those of realistic PPIN. This step will be used as control experiment and/or validation of the developed theory. Finally, the theory will be applied to a large collection of PPIN from different species. The methodological approach developed here should aid experimentalists in the design and interpretation of future studies.

Impact Summary

Impact on research - mathematical methodology Available proteome data are still far from complete and of limited reproducibility. In order to progress in this domain, new mathematical tools based on rigorous formulas are needed for the accurate comparison, evaluation and analysis of these data. Our first direct scientific impact is to prepare the way towards comparative proteomics, by generating these required new and advanced mathematical tools, prove the usefulness of their application to real data, and make them available to the scientific community. Impact on the promotion of systems biology Our scientific approach is intrinsically integrative, relying completely on combining effectively cross-disciplinary expertise in mathematics, bioinformatics, and biology. Our project can impact positively on the awareness in the scientific community of the the feasibility, potential and effectiveness of systems biology research consortia. We will actively reinforce this message by presenting our work at conferences explicitly as the result of a successful systems biology team effort, and we will encourage and assist others in forming efficient multi-disciplinary systems biology teams. Impact on health and well-being The development of computational tools with a rigorous mathematical foundation, that are designed and able to compare unambiguously interactome data from different sources, will increase the quality of our analysis of patient data and the precision with which novel drug targets can be predicted, and thereby accelerate the personalized medicine agenda. Impact on people - teaching and transfer of knowledge King's College is one of the UK's leading academic institutions, in research and in teaching. The post-doctoral researcher of the project will have to analyze and understand protein-protein interaction data, and at the same time master the developed mathematical tools anode the algorithms in user-friendly programs. Our project thus increases the number of biomedical scientists that can work across discipline boundaries. In fact, the two applicants are the main driving forces behind most multi-disciplinary systems biology initiatives at King's College. Impact on the UK's competitiveness - profile and networking The combined dissemination of our novel methodology, via publication in internationally recognized journals, presentation at international conferences, and the creation of accompanying e-tools, together with our activities in promoting their application in medicine, will contribute to raising the UK's profile as a leader in the fields of bioinformatics and translational research. Outreach activities Dissemination via publication in journals, presentation at conferences, and the generation of {\tt e-tools}, would be carried out mainly by the postdoc (guided by the applicants). The translational dissemination towards medicine would be done by the applicants, via their existing commitments in other projects; instead of demanding resources, this increases the effectiveness of other biomedical research. The impact via training (the systems biology teaching activities) and the formation of international networks would also be done by the applicants. They are already committed to stimulating systems biology research and the injection of advanced mathematical methods into the interface between applied mathematics and biomedicine, the present application would make it easier for them to continue doing so.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsSystems Biology, Technology and Methods Development
Research PriorityTechnology Development for the Biosciences
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file