Award details

Development and Dissemination of e-Protein - a Distributed Annotation Pipeline for Proteome annotation using Grid technology

ReferenceBB/D524291/1
Principal Investigator / Supervisor Professor Michael Sternberg
Co-Investigators /
Co-Supervisors
Professor John Darlington
Institution Imperial College London
DepartmentBiological Sciences
Funding typeResearch
Value (£) 19,965
StatusCompleted
TypeResearch Grant
Start date 01/09/2005
End date 31/12/2005
Duration4 months

Abstract

e-Protein (www.e-protein.org.uk) is a BEP-1 GRID pilot project entitled `a distributed pipeline for structure-based proteome annotation using GRID technology¿. The project involves three groups ¿ Imperial College London (Professors Sternberg and Darlington), UCL (Professor Jones and Dr Sorensen) and the EBI (Professor Thornton and Dr Birney). The aim of e-Protein is to provide a structure-based annotation of the proteins in the major genomes linking resources at the three sites by GRID technology (Fig 1). We are on track to deliver the project with highlights of the work to date being: the development and analysis of databases of proteome annotation at Imperial (3D-Genomics) and UCL (GTD and Gene3D); the development of databases providing functional annotation of proteins using structural information (EBI); the development of protein DAS that provides a single web-based portal to access the different proteome annotation databases; the demonstration of inter-site distributed computing for proteome annotation using the Jyde software protocol developed at UCL; the development of the ICENI protocol (at Imperial) for capture of the workflow of the proteome annotation pipeline and map it to multiple Grid resources, providing the capability of true resource brokering. The project is funded by the BBSRC/DTI, runs for 39 months and employs 6 PDRAs. At each site we have one protein bioinformatician and one computer scientist. The first posts started in May 2002 and will end early September 2005 whilst other posts started later. This proposal is for support for three postdoctoral workers each for 4 months that will ensure the full team is working together for four months post September. This funding will enable us to undertake the following topics in the further development and the dissemination of the e-Protein project: the incorporation of three-dimensional structural models into 3D-Genomics structural annotation database at Imperial; the extension to all protein sequences of the functional annotation of possible ligand binding regions based on data from crystallised protein structures at the EBI; the dissemination to the community of the Jyde software for distributed use of computing resources; the incorporation into ICENI of features related to remote database accessibility and use of OGSA-DAI (Open Grid Services Architecture/Data Access and Integration); the dissemination of protein DAS into BioSapiens ¿ the EU network of Excellence for Genome annotation (http://www.BioSapiens.info)

Summary

Proteins are biological molecules that are the machinery of life involved in numerous biological processes such as the breakdown of food to provide energy and the defence of a cell against disease. A protein is a giant molecule typically with more than 1,000 atoms. The chemical formula of a protein is called its sequence and refers to the order along a linear chain of the component amino acid residues. Proteins adopt complex three-dimensional (3D) structures and the location of the atoms can be revealed experimentally. Knowledge of the 3D structure of a protein and its function often provides major insight into biological processes. In addition, this knowledge is of substantial benefit to the design of novel drugs. As a result of the genome projects we have over 200 sequences genomes ranging from human to those of pathogens. To interpret this biological sequence information it is necessary to use computational methods - the area known as bioinformatics. Groups at Imperial College London, University College London and the European Bioinformatics Institute, Hinxton have each developed databases that provide information about the likely structure and function of the protein sequences. In a previous project these groups were supported to develop a computational approach to link these databases - a project known as e-Protein (www.e-protein.org). The development of each database requires extensive computational resources. A second aim of e-Protein was to develop computational methods to share the computing resources at the different site. This is the concept of the computational Grid. This proposal aims to disseminate major developments arising from the e-Protein project. In particular, the software to share databases is to be developed to a stage where it can form the interface for a European project that links a large number of bioinformatics resources (BioSapiens - http://www.BioSapiens.info). In addition the software to share computing power will be developedfurther and made available to the community. Alongside these projects, work will continue of extending the coverage of information provided by the component databases.
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative EDF (e-science Development Fund) (EDF) [2003-2005]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file