Award details

Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community

ReferenceBB/K019783/1
Principal Investigator / Supervisor Professor Douglas Kell
Co-Investigators /
Co-Supervisors
Professor Pedro Mendes, Professor Christoph Steinbeck, Dr Neil Swainston
Institution The University of Manchester
DepartmentComputer Science
Funding typeResearch
Value (£) 682,950
StatusCompleted
TypeResearch Grant
Start date 01/08/2013
End date 31/01/2017
Duration42 months

Abstract

The development of genome-scale metabolic reconstructions -- all-encompassing interlinked maps of all known metabolic reaction pathways for a given organism -- requires integrated computable knowledge about the biochemical entities involved, such as macromolecules and metabolites. The use of unambiguous, semantically typed, publicly available and perennial identifiers for model components is becoming increasingly recognised as being essential if systems biology models are to be shared, reused and developed by communities, maximising their benefit. Such annotation allows a wealth of chem- and bioinformatics data, such as chemical structures and protein sequences, to be immediately extracted from these resources via web service interfaces. As the BioModels Database currently contains thousands of models that use ChEBI identifiers, improvements to and extensions of the ChEBI resource will have an immediate benefit to the systems biology modelling community. We propose here to undertake three such key improvements. Firstly, we will develop a comprehensive cross-platform API library for accessing ChEBI programmatically. The API, libChEBI, will be made publicly available as an open source library in Java and Python. It will include facilities such as extracting biologically relevant groups of compounds (such as tautomers), calculating additional physicochemical properties and semantic reasoning over model annotations. Secondly, we will extend the ChEBI database to enhance stability, increase community involvement and provide a new powerful visualisation for the biological context of molecular entities; Thirdly, we will curate into ChEBI all known metabolites across human, mouse, E. coli and S. cerevisiae.

Summary

After a century of studying nature in greater and greater detail, generating the "parts list" of the molecular components within the cell, the biological sciences have undergone a paradigm shift in the last decade, moving towards putting together these individual molecular pieces to understand their interactions in a holistic context. It is these interactions which give rise to overall cellular processes, and their study has has been termed systems biology. Systems biology brings together a wide range of information about cells, genes and proteins, as well as the small molecules that act on and within these biological structures. In the service of its application areas, such as drug discovery and industrial biotechnology, it gives a holistic perspective aiming to track and eventually simulate the entire functioning of biological systems. In order to build up such holistic models from such a vast collection of diverse data, integration of individual units of information from many diverse databases needs to be performed. This integration of such a high volume of data can only feasibly be performed computationally. To facilitate smooth integration, individual molecular components within the cellular system require stable and unique identifiers. These identifiers are assigned to entities such as genes, proteins or small molecules by standardization bodies and database providers, and effectively allow the molecular parts list to be catalogued. In addition to this, human-relevant information such as names and chemical and biological structures, relationships and properties are also associated with the various entities in the databases, providing resources that are useable by both software tools and researchers themselves. The database Chemical Entities of Biological Interest (ChEBI) acts as a resource for such information and stable identifiers in the area of small molecules of biological interest. ChEBI provides for the bioscientific community semantic, biological andchemical information as well as stable identifiers for small chemical compounds relevant in biology, including the so-called metabolites. Metabolites are small molecules in organisms that are implicated in diverse processes including supplying the body with energy, serving as building blocks for tissue, and acting as a defence or as a signal within the organism or between organisms. For these purposes, ChEBI is widely used in the bioscience community, which sends formal requests for the assignment of identifiers for particular small molecule entities to the ChEBI team, who then perform the assignment, publish the information into the public domain and inform the requesting party that the request has been fulfilled. The aim of the current proposal is to further develop the ChEBI resource and create surrounding tools towards comprehensively addressing the chemical informatics (software and data) needs of the systems biology and metabolic modelling communities, so that they in turn can further their objective to create meaningful simulations and models that enable whole-systems research into pressing public health and energy challenges. In order to facilitate this use, we propose to: 1. Develop a comprehensive software library for accessing ChEBI programmatically which will work across all major available operating systems; 2. Extend the ChEBI database resource to enhance stability, increase community involvement, add additional biologically relevant relationships, and provide a new powerful visualisation for the biological context of molecular entities; 3. Curate into ChEBI all known metabolites across important organisms in systems biology studies: human, mouse, E. coli and yeast. 4. Create new training materials and delivery of training courses to the community.

Impact Summary

The programme of work described in this proposal will continue developing and maintaining ChEBI, an extensive chemical data resource already widely used by the chemistry and biology communities. The work focuses on the extension of the scope of the resource, and on a significant improvement in the computational usability of the database through the proposed libChEBI API. While a number of proposed improvements are focused towards the systems biology modelling communities, generic improvements to the resource's coverage of the chemical space, expanded definition of the relationships between chemicals, and the introduction of a robust programming library to access this rich ontology will produce a wide range of benefits for researchers in any field with an interest in (bio)chemistry and the development of software that underpins research in the area. The models and simulations of biological systems being developed by the systems biology and modelling community that will directly benefit from the proposed work filter into all life-science related industry, such as biotechnology, pharmaceutical, consumer goods, nutrition and health technology. All of these industries are increasingly relying on systems approaches and computational modelling in their research strategy, as documented in our supporting letters from Syngenta, Novartis and Unilever. In agriculture, the issue of efficient food production is also likely to benefit from modelling and simulation, for example by using models to understand how crop yield can be maximised. Designing crop yield for food security relies on metabolic analysis and therefore this resource has a direct impact on that activity. ChEBI is becoming a more and more complete "textbook" for metabolism for the wider public. It has recently extended the usefulness of its pages by incorporating Wikipedia links together with descriptive textual extracts into the main entity pages. We provide an Entity of the Month with our monthly release as a method for engaging the wider public with the broader reach of high-quality annotated data. Annotated data and modelling are also becoming crucial for a personalised approach to healthcare, where models will be developed and calibrated for each person as a basis for rationalising therapies, nutrition and exercise regimes, etc. Improving the accuracy of modelling and simulation approaches as will be enabled by the software library and semantic reasoning we propose here will be a major factor in reducing the number of animal experiments carried out, which will be a generic benefit to a humane society. Therefore beneficiaries will be: a) individual companies in the life-science area and their employees and b) non-profit and commercial agricultural industry and their employees who use modelling as a research tool in their business; c) health-related industries and practicing clinics applying personalised medicine; and finally d) the public at large who will benefit from the products of all those organizations mentioned. Thus this software resource will have significant consequences to society and human well-being. To facilitate the delivery of these benefits to industries and to the public at large, we will publicise our research widely in open access publications, create online impact using Twitter and by writing about our developments in grants, and deliver training courses to the UK community as an integral part of the achievement of our objectives.
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file