Award details

Remote streaming 3D visualisation platform for raw and analysed data from biological mass spectrometry repositories

ReferenceBB/K016733/1
Principal Investigator / Supervisor Professor Andrew Dowsey
Co-Investigators /
Co-Supervisors
Mr Henning Hermjakob
Institution The University of Manchester
DepartmentMedical and Human Sciences
Funding typeResearch
Value (£) 120,280
StatusCompleted
TypeResearch Grant
Start date 08/07/2013
End date 07/01/2015
Duration18 months

Abstract

We propose a 3D visualisation platform for cross-validating complete experimental designs of raw and analysed data from proteomics and metabolomics mass spectrometry (MS). Current tools are critically limited by their loading and handling of the massive datasets, which severely limits productivity and precludes the integrated comparison of whole experiments. By designing a visualisation-centric raw data representation for standards-compliant MS data, we will demonstrate GPU-accelerated streaming 2D/3D interactive visualisations from local storage for the first time without delays. Since memory overhead issues are also mitigated, novel visualisation schemes integrating results and raw data across complete experiments will be possible, greatly facilitating the quality control, verification, validation and expert interpretation of MS analyses. Furthermore, through development of biologically-driven signal compression, we will demonstrate real-time remote visualisation across the Internet to demonstrate the potential for public raw data repositories such as PRIDE enabling immediate and seamless visual access to data from publications and web catalogues. This would lead to substantially improved facility, accessibility and re-use of these strategic community data resources. The compression format will have ramifications for storage of MS data in general. We will provide lossless encoding so that the complete original datasets can also be reconstructed exactly but with the space-saving benefits of our domain-specific compression. We will work with the PSI on an industry-standard raw data representation with pluggable codecs. The visualisation platform will be aimed at standard PCs with consumer grade GPU cards. It will be integrated into PRIDE Inspector using OpenGL with GPU-computation employed for terrain rendering and decompression. We will define an API that allows modular re-use as the interactive visualisation subsystem of Proteosuite and potentially other tools.

Summary

Biologists are increasing wishing to understand the complex interactions between the building blocks of genes, metabolites and proteins that control the function of every living organism. The field of systems biology has emerged to overcome the deficiencies of the traditional reductionist approach, which has identified the building blocks themselves and many of the individual interactions but has not been able to deduce how systems of these blocks act and react in unison. The application of systems biology is widespread, as it promises to revolutionise our understanding of healthy processes in plants, animals and humans, as well as how they break down under disease and how this breakdown can be averted. Often the systems biology approach starts with a 'snapshot' of a particular biological sample. Mass spectrometry is a pervasive technique for gaining a snapshot of the proteins or metabolites in a sample, and it does this by ionising the sample and then measuring each constituent compound's mass and quantity based on the resulting charge. This is often not enough to separate out the sample fully and therefore a preceding phase of liquid or gas chromatography is used to provide an initial separation. Due to technical and biological variations, it will be necessary to analyse the sample a number of times to get reliable readings. Interesting biochemicals can also be broken up into characteristic fragments and these measured, which often gives a confident identification of that biochemical. All this has led systems biology to become a progressively computational discipline. Since the datasets are becoming so large, however, that there is a danger that the process becomes more and more opaque and inaccessible to mass spectrometry practitioners and so more likely to be used as a 'black box'. It is therefore vitally important that tools and platforms are available that allow expert user verification, validation and interpretation of results by checking the raw data acquired, otherwise bias, errors and false assumptions in processing will be routinely overlooked. The massive datasets prove a challenge, however, as existing tools are slow to load and process the data for visualisation which severely limits productivity and precludes the integrated comparison of whole experiments due to limited memory. Part of the reason for this is that existing data formats have not been designed for streamlined retrieval of regions of interest or at varying levels of detail necessary for fast, efficient visualisation. We propose to design such a representation for standards-complaint data, and from that we will demonstrate interactive 3D visualisations from local storage for the first time without delay. Since memory overhead issues are also mitigated, novel visualisation schemes integrating results and raw data across complete experiments will be possible, greatly facilitating the quality control, verification, validation and expert interpretation of MS analyses. Furthermore, through development of specialised image compression, we will demonstrate real-time remote visualisation across the Internet, in a manner similar to Google Earth but for the first time extended for the demands of mass spectrometry visualisation. The European Bioinformatics Institute at Hinxton, Cambridge, has through the ProteomeXchange consortium recently launched raw data deposition into their PRIDE public data repositories, which stores vast amount of publically-funded experiments from around the world. As of September 2012, it holds 324 million mass spectra. Our remote visualisation platform will demonstrate the potential for immediate and seamless raw data access linked by online publications and web resources, which would lead to substantially improved facility, accessibility and re-use of these strategic community data sources.

Impact Summary

As well as the academic beneficiaries, the proposed research has prospective impact for the mass spectrometry industry. The visualisation platform will increase the capacity for validation, cross-validation and re-use of mass spectrometry data. This will make commercial mass spectrometry instrumentation, which requires considerable capital and running costs, more attractive. The proposed visualisation platform could be seen to be in competition with software products from vendors and instrument manufacturers, particularly when integrated into Proteosuite. However, since our software is distributed with a permissive license allowing for its unrestricted re-use in other software packages, both free and commercial, we hope that our work will aid commercial software products similarly and therefore raise the level of the whole field. There is considerable potential in this application for providing indirect benefits to UK public health, quality of life and environmental sustainability. Our stated aim is to break the limitations current tools have with loading and handling massive datasets so that quality control, verification, validation and expert interpretation of mass spectrometry analyses can be facilitated, and accessibility, facility and re-use of community raw data sources such as PRIDE can be improved. This improvement will disseminate down to the public through reduced resources, costs and overheads required for environmental, biological and biomedical discoveries and the characterisation of those discoveries. Since the platform will enable improved quality control and diagnostics, it has the potential to characterise potentially interfering effects and false discoveries, therefore avoiding subsequent misallocation of resources. The PDRA employed on this grant will be encouraged to spearhead public dissemination and will benefit from the unique intensive cross-disciplinary interaction at CADET and EBI that brings together proteomics, metabolomics, bioinformatics and data warehousing expertise together, working towards the same goal.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file