Award details

Toolkit for Interpretation of Organelle Proteomics Data

ReferenceBB/H024247/1
Principal Investigator / Supervisor Professor Kathryn Lilley
Co-Investigators /
Co-Supervisors
Dr Matthew Trotter
Institution University of Cambridge
DepartmentBiochemistry
Funding typeResearch
Value (£) 120,841
StatusCompleted
TypeResearch Grant
Start date 15/11/2010
End date 14/05/2012
Duration18 months

Abstract

Organelle proteomics is an up and coming field as the functionality of proteins and cellular mechanisms are clearly linked to their subcellular location. Such is the growing prominence of the field, the Human Proteome Organisation (HUPO) is holding its annual Barbados Workshop in 2010 on the subject. A major frustration within the field is that the collection of high quality data is very expensive on resources. The data sets that are produced are extremely rich sources of information which, to date, have not been analysed to their full potential because of lack of suitable statistical workflows. Analysis techniques, pioneered by the applicants' laboratories have demonstrated the potential of such datasets to produce robust organelle proteome. The aims of the proposal are to: i) collate data sets from many organelle proteomics approaches including LOPIT, an approach developed in the PI's laboratory. ii) further develop semi-supervised pattern recognition algorithms, to assign protein-organelle membership, resolve protein association with multiple organelles, and identify changes in protein-organelle association across experimental conditions. A freely-available toolkit for organelle proteomics data analysis in the R statistical language will also be developed. iii) apply above to i) and deposit finding in PRIDE and also specialist organism databases such as LOCATE, SUBA, FlyMine The above work plan will be expedited by a multidisciplinary team including, Kathryn Lilley, developer of the LOPIT technology and Matthew Trotter a statistician who has worked extensively with gene and protein expression data. The organelle proteomics and cell biology communities will be the major beneficiaries of this work. The long over due requirement of the proposed work is exemplified by letters of support by not only very productive collaborators of the applicants, but also from two of the top organelle proteomics laboratories in the world.

Summary

Organelle proteomics is an up and coming field as the functionality of proteins and cellular mechanisms are clearly linked to the subcellular locations of the proteins. Such is the growing prominence of the field of organelle proteomics, the Human Proteome Organisation (HUPO) is holding its annual Barbados Workshop in January 2010 on the subject in an attempt to move the field forward such that optimal protocols and data analysis are disseminated through the proteomics and cell biology communities. A major frustration within the field is that the collection of high quality data is very time consuming and expensive on resources. The data sets that are produced in general are extremely rich sources of information, however, to date these have not been analysed to their full potential because of lack of suitable statistical workflows. For example, a recent paper published in MCP (Andreyev et al, Mol, Cell Proteomics -on-line 2009) contains a very rich dataset which has only been superficially mined to produce a limited organelle marker data set, but the rest of the data is not fully analysed and languishes in an spreadsheet within the supplemental data. Basic analytical strategies, pioneered by the applicants' laboratories, have shown the potential of such datasets to produce robust organelle proteome lists containing organelle specific annotations of proteins of unknown localization. Even the applicants' substantial organelle proteomics datasets, however, have only been analysed to a limited extent with established statistical approaches. In this proposal we aim to create more sophisticated statistical tools building on what has already been established by the applicants, which will be enable assignment of proteins to subcellular location using semi-supervised pattern recognition algorithms. These will lead to assignment of protein-organelle membership, resolution of proteins association with multiple organelles, and identification changes in protein-organelle association across multiple experimental conditions. These tools will be produced as freely-available software for analysis of standard organelle proteomics data generated by utilisation of the most common approaches. Application of these novel statistical approaches will lead to the creation of optimal organelle proteomics datasets which will themselves be deposited in a proteomics data repository, PRIDE, which can be publically accessed. In summary, the proposal will create a much needed tool to allow robust analysis of organelle proteomics datasets, and enable re-analysis of existing very rich data sets such that the most optimal mining of these data is achieved. It will also offer optimal tools for analysis of future organelle proteomics datasets which are starting to be produced by the proteomics/cell biology communities in earnest. The above work plan will be expedited by a multidisciplinary team which includes, Kathryn Lilley, developer of the organelle proteomics technologies and Matthew Trotter a bioinfromatician and statistician.

Impact Summary

Organelle proteomics is an up and coming field as the functionality of proteins and cellular mechanisms are clearly linked to the subcellular locations of the proteins. The data sets that are produced are extremely rich sources of information which to date these have not been analysed to their full potential because of lack of suitable statistical workflows. Analysis techniques, pioneered by the applicants' laboratories using of principle components analysis and hierarchical clustering have shown the potential of such datasets to produce robust organelle proteomes. In order to perform the proposed work, we will: i) collate data sets from many organelle proteomics approaches. ii) develop semi-supervised pattern recognition algorithms, to assign protein-organelle membership, resolve protein association with multiple organelles, and identify changes in protein-organelle association across experimental conditions. A freely-available toolkit for organelle proteomics data analysis in the R statistical language will also be developed. iii) apply above to i) and deposit finding in PRIDE and also specialist organism databases such as LOCATE, SUBA, FlyMine The above work plan will be expedited by a multidisciplinary team which includes, Kathryn Lilley, developer of the LOPIT technology and Matthew Trotter a statistician who has worked extensively with gene and protein expression data. Who will benefit from this research? The organelle proteomics community will be the major beneficiaries of this work. The long over due requirement of the proposed work is exemplified by letters of support by not only very productive collaborators of the applicants, but also from two of the top organelle proteomics laboratories in the world. Moreover, cell biologists, both academic and within the pharmaceutical sector will also benefit as this proposal underpins the interface of modern 'omics technologies and more classical cell biological methodologies. How will they benefit from this research? Benefits will be a pipeline to enable optimal mining of organelle proteomics data sets in the form of robust analytical methods for high through put organelle proteomics datasets. Furthermore, approaches will be further developed to enable characterisation sets of proteins whose correlated change in subcellular upon specific perturbation will give insight into cellular mechanisms. Additionally fully characterised organelle proteomics datasets will be deposited in publically accessible databases and sub cellular location information communicated to organism specific databases. What will be done to ensure that they have the opportunity to benefit from this research? The statistical tools produced will be implemented in the R statistical programming environment (www.r-project.org) in order to synchronise with existing efforts to provide open-source R scripts for handling raw LOPIT output to the Bioconductor suite of bioinformatics software (BBSRC: BB/G024618/1). Manuscripts will be written which describe not only the novel statistical approaches developed but also their demonstration by re-analysis of existing data and novel datasets produced with the applicants' laboratory during the course of the funding period. IT is envisaged that these manuscripts will be submitted to high impact journals with large general readership, such as Nature Methods and Nature Biotechnology. KSL is invited to give numerous talks at all the top proteomics conferences world wide, thus she will endeavour to publicise the work described here at such events. KSL and MWBT have recently submitted a FP7-Infrastructure proposal in collaboration with other top proteomics laboratories in Europe. A large portion of this proportion of this proposal is given over to forming transnational training facilities. KSL and MWBT intend to offer organelle proteomics data analysis training as part of this proposal.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file