Award details

Characterization and correction of ascertainment bias in protein interaction network analysis

ReferenceBB/I002421/1
Principal Investigator / Supervisor Dr John Pinney
Co-Investigators /
Co-Supervisors
Institution Imperial College London
DepartmentLife Sciences
Funding typeResearch
Value (£) 261,965
StatusCompleted
TypeResearch Grant
Start date 03/05/2011
End date 02/08/2014
Duration39 months

Abstract

Increasingly, research in the biological sciences is driven by the need to understand complex cellular systems in their entirety. As a result, the production and analysis of genome-scale data play a central role in modern biology. One area of particular importance is the study of protein-protein interactions, as interrogated by yeast two-hybrid, tandem affinity purification plus mass spectrometry, protein fragment complementation or other direct or indirect methods. The networks formed by these interactions constitute an essential framework of cellular processes, into which more detailed models are being constructed. The problem of biased, noisy and incomplete protein interaction data is well known and has substantial impacts on conclusions drawn from the analysis of these networks, rendering much of the published research on network biology questionable. In this project we aim to develop a statistical modeling framework for the quantification of bias and error characteristics in genome-scale network data. This methodology will be applied to construct appropriate null samples for hypothesis testing of network properties, in order to re-assess the validity of existing claims of biological significance taken from the literature on biological networks. Using the statistical models developed, we will integrate the available protein interaction data to produce a probabilistic view of each organism's interactome and the extent to which it has been sampled, to be provided as a resource for the research community. This work will provide a crucial contribution to the ongoing development of systems biology by enabling a more 'cordial' meeting between top-down and bottom-up approaches.

Summary

Increasingly, research in the biological sciences is driven by the need to understand complex cellular systems as a whole. The production and analysis of very large amounts of data therefore play a central role in modern biology. These data are frequently organised around the concept of the genome, the set of all genes for each organism, so are said to be 'genome-scale'. One area of particular interest is the study of protein interactions, that is the various ways in which proteins (which make up the molecular machinery of the cell) are able to combine and communicate with one another. By understanding more about protein interactions and the biological networks that they support, researchers hope to explain many of the processes that keep cells alive. However, we only have a partial view of the complete set of protein interactions, even for a relatively simple organism such as yeast. It is known that almost all of the available data is biased towards reporting certain types of interactions, which may vary depending on the type of experiment used. Interactions drawn from small-scale experiments will often be biased towards well-studied proteins, leaving large parts of the genome untested. In addition, these experiments are not 100% accurate, so a certain amount of 'noise' enters the data in the form of incorrect or missing interactions. All of these factors together have a large impact on computational analyses of protein interaction networks, meaning that some features of the network might appear to be significantly different from those expected, when in reality they are not. Detecting, measuring and correcting for these errors and biases is therefore essential before we can produce reliable assessments of the properties of protein interaction networks and their implications for the function and evolution of biological systems. In this project, we will use statistical modeling to address these issues and produce software that can correct for the biases presentin the data. Using this software, we will re-test many of the currently held ideas about protein interaction networks to see if they are still valid when bias is taken into account. To help other researchers make the best use of the available interaction data, we will also produce a web-driven service to provide a confidence score for each possible interaction and the probability that it has been tested correctly.

Impact Summary

Research in the fundamentals of systems biology is expected to have substantial long-term impact for the general public in the context of drug development, genomic medicine, post-genomic agriculture and the biotechnological applications of synthetic biology, including biofuels. More immediate impacts will be concentrated in the pharmaceutical and biotech industries, where the effective exploitation of protein interaction data is of great interest. Engagement with the industrial beneficiaries of the research will chiefly be through presentations and demonstrations at national and international conferences. Our existing websites will be used to promote the resources developed, and a dedicated website and web service will provide unrestricted public access to the research outputs. We aim to publish the research in high-quality, high-impact journals such as the Nucleic Acids Research webserver issue, and have requested funds to ensure that all publications can be made open access to ensure accessibility beyond the academic community. Further opportunities for engagement with the wider public will be sought through the Imperial College media office and the Royal Society, which funds JP as a University Research Fellow. We are very keen to maximize the exploitation of the research and to encourage its re-use, both in the academic and non-academic communities. By the end of three years we therefore aim to have established a permanent, automatically-updated website and DASMI service that will continue to provide access to the resources developed beyond the lifetime of the project. These resources will be freely available to the public to enable third parties to develop their own applications of our research. We expect that this project will provide an excellent opportunity for JP to develop collaborative relationships with industrial partners in the form of future research proposals and joint studentships. The resources developed are likely to be of immediate benefit to all pharmaceutical and biotech companies with an interest in network biology.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsMicrobiology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file