Award details

Pipeline for interpretation and storage of organelle proteomics data

ReferenceBB/G024618/1
Principal Investigator / Supervisor Professor Kathryn Lilley
Co-Investigators /
Co-Supervisors
Mr Henning Hermjakob, Dr Wolfgang Huber, Dr Lennart Martens
Institution University of Cambridge
DepartmentBiochemistry
Funding typeResearch
Value (£) 118,713
StatusCompleted
TypeResearch Grant
Start date 01/01/2010
End date 31/03/2011
Duration15 months

Abstract

Determining the sub-cellular location(s) of a protein is essential in the elucidation of cellular processes. It is not possible to purify most organelles away from significant amounts of contamination of organelles with similar physical properties. Recently several high throughput methods have emerged involving proteomics methods, which have overcome the need for pure organelles. These methods rely on the characterization of distribution pattern of organelles amongst partially enriched fractions generated by various technologies. For all of these methods, two data analysis stages are essential; appropriate normalisation of quantitative data and removal of system bias; robust multivariate processing and a statistical assessment of the confidence in the results is required to match distribution and enrichment patterns to those of known organelle markers. To date there is no easily accessible, streamlined, facile software suites which allow data analysis and capturing of data and metadata in a standardized way which can then be easily accessed and interpreted by the community. PRIDE is now the global repository for proteomics data, recommended by major journals for deposition of relevant datasets. PRIDE does not have the functionality to deal with deposition of organelle proteomics datasets. Here we aim to produce a facile organelle open source pipeline which can be utilized in a user friendly, explanatory and scientifically sound manner. The culmination of the project will be a suite of open source software, into which raw data will be applied. After data normalisation, a choice of statistical tests will be available, with clear explanation of how these tests operate, to allow clustering of data to reveal assignments to organelles, where possible. These assignments, raw data, details of the experimental design and starting samples will then be captured by PRIDE allowing storage of the complete information about the experiment, aiding publication and data sharing.

Summary

Organelle proteomics is an emerging field within the area of the study of the proteins. The proteome is the set of proteins expressed by a cell or found in a biological fluid at any given time and circumstances. Within cells of more complex organisms such as fungi, plants and animals, many proteins are found in specific subcellular structures called organelles, where they carry out their function. Determining the sub-cellular location(s) of a protein is very desirable to biologists for two reasons. Firstly, it can help elucidate their role in the cell as proteins are spatially organised according to their function, and location is an important determinant of the specificity of their molecular interactions. Secondly, it refines our knowledge of cellular processes by pinpointing certain activities to specific organelles. Unfortunately, most organelles cannot be purified away from contaminants in such a way as to lead to an accurate catalogue of proteins from any given organelle. Recently several high throughput methods have emerged involving quantitative strategies, which have overcome the need to produce a pure organelle for analysis. Each of these methods relies on quantitative proteomics to characterize the distribution pattern of organelles amongst partially enriched fractions generated by various separation technologies and have the potential to discriminate between genuine organelle residents and contaminants without preparation of pure organelles. For all of these methods, two data analysis stages are essential; the first deals with appropriate normalisation of quantitative data and removal of system bias; the second involves robust multivariate processing of the signals and a statistical assessment of the confidence in the results, which is required to match distribution and enrichment patterns to those of known organelle markers. Fully curated data sets containing information about experimental design, data manipulation and assignment of proteins to subcellular locations would be of immense value to biologists. To date there is no easily accessible, streamlined, facile software suites which allow data analysis and capturing of data and meta data in a standardized way which can then be easily accessed and interpreted by the community at large. In this proposal three groups that already have successful and fruitful collaborations in place and extremely complementary areas of expertise, Lilley (organelle proteomics) Huber (Statistics, software), Hermjakob and Martens (PRIDE database), all within the Cambridge area, aim to produce a facile organelle pipeline used by the growing organelle proteomics community and will aid not only data analysis, but data storage and presentation of data for submission to all the major journals. Its output will be easily accessible by a wide variety of biologists and will facilitate data sharing amongst a growing cohort of scientists..
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file