Award details

Standardised metabolite annotation workflows for enhancing biological interpretation in metabolomic data repositories

ReferenceBB/T007389/1
Principal Investigator / Supervisor Ms Claire O'Donovan
Co-Investigators /
Co-Supervisors
Institution EMBL - European Bioinformatics Institute
DepartmentOMICs
Funding typeResearch
Value (£) 205,651
StatusCurrent
TypeResearch Grant
Start date 03/04/2020
End date 30/09/2023
Duration42 months

Abstract

Metabolomics is a commonly applied scientific tool used to study the qualitative and quantitative changes in the metabolite composition of biological samples in relation to agriculture, biotechnology, human health and ageing. The chemical or structural identification of metabolites is a crucial step in untargeted metabolomic studies but is currently limited by our knowledge of the parts lists of metabolomes, by our understanding of how metabolites are chemically identified and the availability of authentic chemical standards to derive data for metabolite identification. Even though our understanding of metabolome compositions and metabolite detection is increasing, metabolite identification remains the rate limiting step in untargeted metabolomic studies. This is emphasised when investigating data present in metabolomic data repositories, of which the two largest repositories are MetaboLights and the Metabolomics Workbench. On investigating the number of features detected in liquid chromatography-mass spectrometry datasets deposited in MetaboLights, 89% were structurally unidentified, demonstrating the large volumes of accessible biological data available to all scientists globally which can not be applied in further translation to biological conclusions. The proposed research will develop a computational workflow to annotate metabolites present in untargeted LC-MS and NMR datasets and will apply the integrated computational workflow to all LC-MS and NMR untargeted metabolomics datasets submitted to MetaboLights and Metabolomics Workbench (approximately 2000 datasets). The research team will also disseminate the open access computational tool and will develop and release open access training courses for operation of the computational workflows.

Summary

Metabolites are small biochemicals which have many important roles in biological systems including metabolism. Metabolites are studied in many different biological systems including microbes, plants and humans to benefit the human population through increased crop yields, the manufacture of drugs and in understanding how humans age and how the process can be modified to improve our health. The study of metabolites in biological systems is called metabolomics which has the aim to study thousands of metabolites and investigate the biological processes they are involved in. These studies do not know which metabolites will be detected at the start of the study and instead translate the raw analytical data to more biologically meaningful data during the study by chemically identifying the chemical structure of each metabolite, for example to define that the metabolite is glucose or a lipid. The conversion of the data to a metabolite is required so to derive biological conclusions, no metabolite identification will result in no biological information being reported. Many metabolomic studies are made available to all of the scientific community in data repositories. One data repository is located in the UK and is called MetaboLights and the other large data repository is located in the USA and is called the Metabolomics Workbench. These two data repositories contain information from nearly 2000 metabolomic studies performed on microbes, plants and mammals including humans. Across all studies up to 89% of all detected metabolites are not identified, have no chemical structure assigned to them and so significant levels of information from which biological knowledge can not be derived are present. The project to be performed will construct a computational workflow to assign chemical structures to the majority of metabolites in datasets already present in both of these data repositories and also to be applied to all future datasets deposited to the repositories. On completion,the biological information available in the two data repositories will be greatly expanded and will allow further biological information to be derived.

Impact Summary

Impact Summary - Standardised metabolite annotation workflows for enhanced biological interpretation in metabolomic data repositories There will be a number of direct or indirect benefits observed by academic and industrial research groups, commercial industrial companies, and the research staff employed for the proposed research. Many national and international academic groups and businesses will benefit from the publicly accessible datasets with significantly increased numbers of metabolites which are identified. These include: 1. Academic researchers performing non-targeted metabolomics using LC-MS and NMR. The resource developed will benefit research in to microbes, plants and animals in areas including synthetic biology, crop production and human ageing in two different ways (i) an open access computational resource which will be available to all researchers globally to apply in their research and (ii) access to approximately 2000 currently deposited metabolomic datasetswith enriched numbers of identified metabolites and therefore containing higher levels of metabolic and biological information. 2. Industry scientists performing metabolism research who can benefit in the same ways as for academic researchers. The computational workflow developed can be applied by these researchers and the biologically enriched datasets can be investigated to support greater understanding of the metabolism underlying the production of pharmaceuticals and chemicals and in improved crop production, as examples. 3. Government agencies in the UK performing metabolism research who can benefit in the same ways as for academic researchers. The computational workflow developed can be applied by these researchers and the biologically enriched datasets can be investigated to support their research. For example, the Department for Environment, Food and Rural Affairs in the UK who through the FERA facility apply non-targeted metabolomics for food safety and food authenticity testing andcrop protection. 4. Commercial instrument suppliers, specifically those supplying mass spectrometers and nuclear magnetic resonance spectrometers as the resource will be applicable to a range of different analytical platforms from different commercial instrument suppliers. 5. Post-doctoral research associates employed during the research through training in different scientific disciplines and through personal and organisational development.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file