Award details

Open source pipelines for integrated metabolomics analysis by NMR and mass spectrometry

ReferenceBB/M020282/1
Principal Investigator / Supervisor Professor Andrew Jones
Co-Investigators /
Co-Supervisors
Professor Francesco Falciani, Professor Lu-Yun Lian
Institution University of Liverpool
DepartmentInstitute of Integrative Biology
Funding typeResearch
Value (£) 96,649
StatusCompleted
TypeResearch Grant
Start date 01/11/2015
End date 31/12/2016
Duration14 months

Abstract

Metabolomics comprises an important suite of techniques in modern Life Sciences research, typically performed by NMR spectroscopy or mass spectrometry (MS), applied in a range of fields for biomarker discovery, as well as for understanding metabolic networks in complex and dynamic systems. One of the biggest challenges preventing more widespread adoption of these powerful techniques is that data analysis is difficult, especially when data sets are collected in high-throughput modes. Each technique presents its own challenges, requiring pipelines of (often poorly connected) tools for an end-to-end analysis, and a significant amount of manual analysis for steps where robust software is lacking. For individual steps within a workflow there exists commercial or free software at different stages of maturity, however there are few solutions that offer the capability for automated analysis from data collection through to statistical analysis. In the genomics and proteomics domains, the Galaxy framework has become a popular mechanism for building pipelines of modular tools (originally of command-line nature), through a web interface. Galaxy can be easily configured to run on single servers, compute clusters or cloud-based solutions. In this project our groups at the Universities of Liverpool and Birmingham, both of which have a track record in Galaxy development, will collaborate to build a set of metabolomics tools in Galaxy, enabling the construction of analysis pipelines for both NMR and MS analyses. Crucially, the pipelines will deliver data sets to a shared statistical analysis toolkit, enabling integrated analysis of data sets derived from both techniques. We will also contribute to the development of international data standards for metabolomics, and our new pipelines will facilitate the deposition of experimental metabolomics data into the MetaboLights database at the EBI.

Summary

Research in the Life Sciences is now commonly performed using high-tech instrumentation, producing very large amounts of data about a system of interest. These techniques are collectively called 'omics (e.g. including genomics, proteomics and metabolomics) - and in different ways can measure how genes are switched on or off, how the proteins encoded by those genes behave in a cell or tissue of interest, or how the metabolites (biochemical molecules in cells) change in abundance, as the system behaves normally or is put under stress by disease, dysfunction or the introduction of toxic substances. The metabolites studied can include molecules that provide energy or structure to cells (e.g. fats, sugars etc), the structural building blocks of DNA and proteins (e.g. nucleotides, amino acids) and essential co-factors to biological processes (e.g. vitamins). In fundamental research, and in clinical situations, the presence of a particular metabolite at an unusual abundance can be an indicator (a biomarker) of a particular state - such as a disease. Indeed, metabolomics research is applied in studies on cancer, infectious disease, heart disease, diabetes and many others. One of the greatest challenges in metabolomics research is that the analysis of the data is very difficult. Multiple different processing steps are needed to get from the raw data as delivered by the instrument - primarily nuclear magnetic resonance (NMR) spectroscopy or mass spectrometry (MS), to the final results the researcher is interested in, i.e. quantitative and statistically significant differences in particular metabolites between samples. There are multiple software packages (both commercial and free) that can perform individual steps within a complete pipeline, but there is very little good software that makes it easy to perform a full analysis. In this project, we will build such software for data generated from NMR or MS, using a software framework called Galaxy. Galaxy has been designedto construct a web interface on top of other software packages, enabling different (previously disconnected) packages to be joined together into an easy to use pipeline. The joining together of modules needs data files in a standardized format as the input and output of each step, so we will also work within international organizations to help agree on a universally applied standard format to be used in our pipeline and by other software developers working in metabolomics. Our pipeline will make it much easier for scientists to analyse their data and, in particular, to compare or integrate data coming from both complementary techniques (NMR and MS) to get a more complete picture of the system being studied. This will facilitate many more researchers - who currently lack detailed knowledge in metabolomics - to embrace and exploit this powerful technology. Lastly, we will make it easier for scientists to put their data into public databases when they publish their research, enabling other scientists to verify their findings and in some cases re-analyse their data in their own labs.

Impact Summary

Impact on health and society: The overall purpose of the project is to make data analysis for metabolomics more straightforward. Metabolomics is a technique increasingly used in human, animal and plant research, and as such, there is the potential for longer term (indirect) impacts, for example through facilitating biomarker discovery and the understanding of molecular mechanisms in fields including ageing, human and environmental health, food safety, industrial biotechnology, bioenergy and synthetic biology. Economic impact: The facilitation of public data deposition has the potential for long term (indirect) economic impact, since it provides the opportunity for data sets (often collected at great expenses) to be re-purposed or re-analysed, fostering new research areas or in some cases reducing the requirement to collect new data. Staff development: The postdocs involved will have the opportunity to work as part of an international network (for example working with the EBI, COSMOS, MSI and PSI) in a cutting edge software project. The PIs will benefit through exchange of skills and expertise between partners (the team has strong expertise in software engineering, MS, NMR, data analysis and statistics).
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file