Award details

Tools for the text mining-based visualisation of the provenance of biochemical networks

ReferenceBB/E004431/1
Principal Investigator / Supervisor Professor Sophia Ananiadou
Co-Investigators /
Co-Supervisors
Professor Pedro Mendes, Professor Steve Pettifer, Professor Junichi Tsujii
Institution The University of Manchester
DepartmentChemistry
Funding typeResearch
Value (£) 549,458
StatusCompleted
TypeResearch Grant
Start date 02/01/2007
End date 01/06/2010
Duration41 months

Abstract

Systems biology is concerned with the modelling, visualisation and analysis of biochemical networks in which, for instance, metabolites are 'linked' by arrows representing the enzymes which turn one molecule into another or which are modified by particular substances. SBML provides a computer-readable 'standard' for describing such biochemical or signalling networks. However, these diagrams (and thus the SBML models) are divorced from the scientific evidence on which they are based, represented by the scientific literature (and increasingly by online databases). In order to overcome the problems of reading the burgeoning scientific literature, we shall deploy Text Mining TM. TM involves named entity recognition (i.e. semantic annotation of enzymes, metabolites, etc) and information extraction (i.e. relationship extraction between named entities). An important part of this proposal is to find solutions for the terminology problem in systems biology, by developing techniques for recognising synonym terms.Based on our efficient parsing techniques, we shall extract relationships between entities that will form the basis by which we shall can discover, index, store and display the scientific evidence for such linkages. The selection of the most pertinent relationships will be performed using our preferred methods of advanced machine learning (Support Vector Machines and Genetic Programming). The overall aim of the project is thus to develop and deploy the necessary TM tools and to use them to display the different relationships to the user together with the literature from which they have been extracted. The different types (and strength) of evidence for these interactions will then be visualised directly and linked to a dynamic website of the literature. This will thus give users a direct linkage between the systems biology diagrams encoded in (an advanced form of) SBML and the scientific evidence for them. Where available, linkages to kinetic data will also be made.

Summary

Systems biology is concerned with the modelling, visualisation and analysis of biochemical networks in which, for instance, metabolites are 'linked' by arrows representing the enzymes which turn one molecule into another or which are modified by particular substances. However, these diagrams are divorced from the scientific evidence on which they are based, which is represented by the scientific literature (and increasingly by online databases). However, the historical scientific literature is huge, and is increasing at an enormous rate (several thousand papers per week) so no one can possibly read it all. One solution is to use computers to 'read' these papers and present to the user only those which carry relevant information. Aspects of this subject are variously known as Natural Language Processing and Text Mining. What Text Mining does is to go through papers, extract the relevant pieces of information from each paper, and present them to the biological reader. A particular problem is the use by biologists of multiple names for the same thing. Text Mining can assist here since it is able to find all the variations of the same name and link them with the relevant text and databases. Text mining can also find the TYPES of relationship between these names, and this is the basis by which computers can discover and display scientific evidence. The Text Mining System will produce and index such evidence, for specific problems, and this will be stored in an appropriately structured database. The aim of the project is therefore to develop and deploy the necessary Text Mining tools and to use them to display the different relationships to the user and the literature on which they are based. This will be done by encoding the interactions using arrows of various colours that will link to a dynamic website of relevant literature that will thus give a direct linkage between the systems biology diagrams and the evidence for them.
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsSystems Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Technology Development Initiative (TDI) [2006]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file