Award details

Bayesian evidence analysis tools for systems biology

ReferenceBB/I023461/1
Principal Investigator / Supervisor Dr Stuart Aitken
Co-Investigators /
Co-Supervisors
Professor Andrew Millar
Institution University of Edinburgh
DepartmentSch of Informatics
Funding typeResearch
Value (£) 75,104
StatusCompleted
TypeResearch Grant
Start date 06/10/2011
End date 31/07/2013
Duration22 months

Abstract

This project will address the problems of optimising and comparing stochastic systems biology models by applying the nested sampling algorithm (Skilling, 2006) that computes the Bayesian evidence. These functions will be delivered to users by incorporating them in a new version of the popular stochastic simulation tool Dizzy (Ramsey et al, 2005). The nested sampling algorithms will also be released as R and Matlab packages. By comparing the total evidence in favour of each alternative model of a biological system (measured in decibans or in bits), systems biologists will be able to evaluate alternative modelling decisions, and to compare alternative stochastic models, in the light of the experimental data. This will be achieved by integrating over all plausible parameter values to estimate the Bayesian evidence. The tool will also provide the modeller with an analysis of samples drawn from the posterior distribution of parameter values that is generated by nested sampling. Multiple modes in the distribution of a parameter, and correlations between parameters will be automatically identified by regression and clustering: these are of great interest to systems modellers and will generate novel insights into the biological models and data under investigation. Algorithms for intelligently managing the optimisation procedure for the user will be provided, including methods to terminate the run when the most informative samples have been located, and methods to detect when the user has selected inappropriate values for the optimiser. These features will assist the uptake of the new tools. The new tools will be immediately useful to Dizzy users, who will be able to optimise models to fit experimental data, and compare models, with minimal configuration of the optimisation algorithm. R and Matlab users will be able to run the optimisation in conjunction with the simulators and other modules provided by those environments.

Summary

The study of biological systems, from cells, to organisms and populations, is becoming increasingly quantitative. Even at the level of a single cell, molecular biologists and geneticists are able to measure amounts of molecules such as proteins and RNAs, and to begin to unravel the connections between molecules that make up the pathways and processes that keep the cell functioning. Our knowledge of the interaction of molecules and genes comes from many sources. These include studies of the three dimensional structure of proteins, from which their function can be inferred, through to in vitro and in vivo studies that show how genes, and the molecules that switch them on and off, interact in the test tube, and in a key single cell organism such as yeast, or higher plant such as Arabidopsis thaliana. The way that molecular systems are described is changing from the traditional diagrammatic sketch of likely interactions, to a set of mathematical equations linking the rates of change of one molecule with the amounts of others. When the number of molecules is small, a set of stochastic reactions becomes a more accurate representation than a set of ordinary differential equations. But in both cases, finding the best fit between a mathematical model and data from the laboratory becomes a major problem. A second important issue concerns the justification for decisions made in modelling a biological system. We might like to say that only one model describes the data - but this is not possible for any complex system. Instead, we can hope to show that one model fits the data better than another, and this is the aim of the research proposed here. We shall apply a probabilistic approach that can optimise the fit of models to data, and quantitatively compare the extent to which they fit the data. This will provide useful information to the bench biologists and the systems biologists with whom they collaborate to further our knowledge of the cell.

Impact Summary

This project is in the strategic research priority area of systems approaches to biological research. A novel computational tool will be developed for systems biology modelling. Synthetic biology is also a research priority and the tool will be immediately applicable to models of synthetic systems. Who will benefit? We have identified the immediate beneficiaries of the software to be produced. These are the large number of systems biologists and synthetic biologists studying and working in the UK. We identify two user groups within this community: those primarily interested in the model and the underlying biology, and those who (additionally) require the sophisticated mathematical and statistical packages available for R and Matlab to investigate properties of models. The work of the immediate beneficiaries will impact on several key targets identified by the research council. The improvement of food production (Crop science) can benefit from systems models, e.g. by increasing our understanding of the circadian clock, as will be studied here. Similarly, the development of genetically modified or synthetic organisms for tackling pollution or generating energy (Bioenergy) can also benefit from mathematical models of the system and its parts. Research in these areas is readily exploited in the biotechnology, agriculture and pharmaceutical sectors of the economy. How will they benefit? Systems biologists who are more concerned with a particular model, and its fit to the available wet lab data, will benefit from the new version of the Dizzy simulation tool that this project will produce. This tool will provide an easy-to-use interface that allows models to be optimised and the Bayesian evidence computed. The optimal parameter values, and their standard deviation (estimated from the posterior distribution), along with the evidence value (in bits or decibans) will aid their research. Those systems and synthetic biologists who are concerned with the properties ofmodels, or the modelling process, will have access to the algorithms at the programmatic level, and to the code itself. Those who have benefited indirectly, i.e. through the earlier use of systems modelling in the development of a modified organism or the identification of a drug target, will be able to rely on the mathematical analysis of the evidence that supports the use of the model. The tool will contribute to the base of evidence upon which decisions can be made. What will be done to ensure they benefit? The software to be developed will be made available on an open source basis. We shall publish the scientific results as widely as possible, and in open source journals.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityTechnology Development for the Biosciences
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file