Award details

CCPNGrid: A framework for high throughput computing in NMR spectroscopy

ReferenceBB/D006384/1
Principal Investigator / Supervisor Professor Ernest Laue
Co-Investigators /
Co-Supervisors
Institution University of Cambridge
DepartmentBiochemistry
Funding typeResearch
Value (£) 59,807
StatusCompleted
TypeResearch Grant
Start date 01/02/2006
End date 31/01/2007
Duration12 months

Abstract

Nuclear Magnetic Resonance (NMR) spectroscopy has become a key tool for determining the 3D structure of biomolecules. The two main steps that determine the speed with which biomolecular NMR data can be processed, are the extraction and analysis of information from the NMR spectra, and the subsequent 3D structure calculation. The software available to perform these steps is not as well developed as it is in, for example, X-ray crystallography, limiting the application of NMR. This project intends to provide the UK NMR community with the means to execute state-of-the-art 3D-structure calculation and validation software, so that the quality and scientific value of structural coordinates from NMR can be improved. The main aim of this project is the creation of a framework where novel computational methods, that require computing resources that are not typically available in NMR laboratories, can be automatically executed via the Grid using data stored in the data model provided by the Collaborative Computing Project for the NMR community (CCPN). Whilst this project will provide a central calculation facility for small NMR laboratories, it will also enable larger NMR groups with their own compute clusters to install the framework for internal use. In this pump/priming application we will implement automated NOE assignment and NMR structure calculations using the ARIA software package, which uses the Crystallography and NMR System (CNS) for structure calculations. When a framework for executing calculations on high performance computing facilities and clusters of workstations has been established, this resource will be extended to other software being implemented in the CCPN project (e.g. CLOUDS and Inferential Structure Calculations, validation programs like QUEEN, and other software being developed within the EU Extend-NMR project). The development within this project is shared by three groups. The EDL group at the University of Cambridge has a long history of NMRstudies of biomolecules and NMR methods/software development, and is central to this project through its coordination of the CCPN project. The software framework developed by CCPN will be used to handle and validate the NMR and molecule information required for the structure calculations. The Cambridge eScience centre will play a key role in the project by providing expertise for the implementation of calculations, initially on the High Performance Computing Facility (HPCF) and CamGrid at Cambridge, and later at other locations on the Grid. In particular, they will create a workflow tool that can handle the different steps involved in NMR structure calculations. The Macromolecular Structure Database group at the European Bioinformatics Institute is part of the world-wide Protein Data Bank (wwPDB), and will provide expertise in handling molecular data for creating topology files for the calculations, and in upgrading the RECOORD database. There are several immediate benefits resulting from this project: 1) A resource will be provided for the NMR community for automated structure calculation and validation using the latest protocols, 2) A tool will be developed to generate topology files for ARIA and CNS. This will be especially useful for scientists working with protein complexes, 3) The workflow tools developed as part of this project at the Cambridge eScience Centre can be used in a wider context (e.g. to set up other projects in computational chemistry), and 4) The RECOORD database, which contains PDB entries that have been recalculated with the latest structure calculation protocols, can be further extended and automatically updated. In the long term, it will be possible to make available all the calculation and validation protocols (e.g. CLOUDS and Inferential Structure Determination) that are being implemented in the CCPN project. This will provide the NMR community with an invaluable resource to calculate and analyse protein structures.

Summary

Proteins are the workhorses of a living organism. They are involved in many functions, and without them life as we know it could not exist. In a human body for example, certain proteins transport oxygen in the blood, others defend us against bacteria and viruses, and still others help to digest food. All proteins are composed of amino acids. These amino acids are the building blocks of proteins, and they are connected to each other to form a long chain. There are 20 naturally occurring types of amino acid, each with a different shape. Because each protein has a unique amino acid sequence, how the protein chain folds in 3D space is also unique. For example, a part of the chain can fold back on itself (beta-hairpin), or it can fold into a coil-like structure (alpha-helix). A combination of these structural elements then interact with each other to form the complete fold of the protein. To understand how a protein works, we need to know how this long chain of amino acids folds. This can be determined using two techniques: X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy. This proposal is directed at NMR. With NMR, you can determine which atoms in the protein are close to each other in space. For example, if the protein chain forms a circle, you can determine that the atoms of the first amino acid are close in space to the atoms of the last amino acid etc. NMR experiments produce a lot of this type of distance information, and a lengthy calculation on a computer is necessary to determine the exact fold of the protein chain. These calculations basically convert distance information into three-dimensional coordinates. This is called a 'structure calculation', and it can be done in many different ways. Also, these structure calculations are quite complex and require a lot of expertise to set up on a computer. We propose to set up and run automatically the latest and most sophisticated structure calculation software on a set of fast computers. This software would be available over the internet to researchers, so that they can use state-of-the-art software with little effort. They could also install it in their own laboratories if they have sufficiently fast computers of their own. Even if you are using the best software, it is still possible that there are problems with the results of the structure calculation. This can be due to mistakes made when analyzing the NMR data, or just because we did not have enough information to get a good answer when we started the calculation. For this reason, we will also automatically run validation programs that analyse the structures resulting from the calculation. This validation will help the researcher find out whether the results are scientifically correct. Finally, we can use the calculation setup to recalculate old structures. The Protein Data Bank (PDB) stores the structures of proteins that were calculated by people all over the world. The way different scientists calculate the structures can, however, be very different, and it can be difficult to directly compare the structures to each other. Recalculating the structures using the same program will improve the quality of the structures. They will also be more consistent with each other, and it will be easier to compare them directly.
Committee Closed Committee - Biomolecular Sciences (BMS)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative EDF (e-science Development Fund) (EDF) [2003-2005]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file