Award details

Automated building of carbohydrate molecules using X-ray crystallography data

ReferenceBB/K008153/1
Principal Investigator / Supervisor Professor Keith Wilson
Co-Investigators /
Co-Supervisors
Institution University of York
DepartmentChemistry
Funding typeResearch
Value (£) 159,088
StatusCompleted
TypeResearch Grant
Start date 10/04/2013
End date 09/04/2015
Duration24 months

Abstract

The project is made up of four steps. 1. Devise a test suite of datasets which are representative of the problems faced in building sugars in macromolecular structures, and a library of well refined carbohydrate structures. These will be mined from the PDB and assembled into a curated library. The test suite will be augmented by structures previously determined at YSBL. 2. Explore the conformational space of a given carbohydrate. An initial model will be obtained from the library above, from online resources such as the GLYCAN database, or from the supplied description. Local conformational variability around any position in the molecule will then be explored using a library of disaccharide fragments, or if necessary a grid search of ring conformations and linkage torsions. 3. Fitting of the model into the electron density will be accomplished by finding a start position using either a YSBL-developed tool to detect the electron density fingerprint of pyranose rings for carbohydrate complexes, or from the location of the glycosylated residue for glycoproteins. The conformational search will be used iteratively to add successive units from the start position, using a look-ahead search. 4. A non-interactive version of this software will be used to generate an ensemble of solutions. These will be forwarded to refinement in the 'refmac' software and evaluated on the basis of the X-ray data using difference density maps, and stereochemistry using the MolProbity score. Scoring functions for filtering the list of conformations before the slow refinement step will be examined. Graphical user interfaces will be developed to the widely used and freely distributed 'Coot' software for building macromolecular structures, and the CCP4 suite.

Summary

Carbohydrate molecules are an essential part of the living world, making up the sugars in our food, the fibres in clothes and the cell walls of green plants. The interaction of carbohydrates with other biological molecules, and in particular proteins, is an important part of many biological processes. Understanding this interaction is important for understanding how cells work and interact with one another, as well as being important to diverse bio-technologies such as the breaking down of fibrous landfill waste and the development of biological washing powders. Many proteins secreted by higher organisms have carbohydrates built directly into their structure and those incorporated into the cell membrane contribute to cell-cell recognition. X-ray crystallography - essentially an extremely powerful microscope - allows us to see the atoms in the 3D structures of biological macromolecules such as proteins. This knowledge is vital to an understanding of such molecules carry out their tasks. Protein-sugar complexes and glycoproteins can be studied using crystallography, but while the protein can often be seen fairly clearly in the resulting 3D structure, the sugar is often blurry because carbohydrates are often rather flexible. Interpreting the magnified image in terms of atoms and bonds can be a time consuming project, and the results somewhat subjective. The aim of this project is to provide an automated method for performing this interpretation. While automation is of value in that it frees up researcher time to concentrate on the scientific problems, a more important benefit is that it allows many possible interpretations of the magnified electron density image to be explored. In difficult cases this larger starting set of models is more likely to contain the correct answer that a single model built by a crystallographer. The different models can then be ranked to choose the best one. The structure of the protein is easily built by known methods, leaving a 'blob' of unaccounted for electron density into which the sugar must be placed. An initial set of possible structures for the sugar will be determined using existing web-based software and the Protein Data Bank (PDB). Dr. Cowtan at York University has previously written computer software which successfully identifyies the sugar rings in nucleic acids (including DNA which carries the genetic information) from their electron density alone. The 'fingerprint' technique involves the identification of shapes which are always present when the sugar is present. This approach will be modified to identify sugar rings in the carbohydrates. Each possible ring shape will be tested against the X-ray result, and neighbouring rings will be linked together. This will give a large pool of possible structures which can be ranked by automatic scoring methods based on the 3D X-ray maps and the plausibility of the chemistry. The resulting methods will be applied to two problems. The building of (1) carbohydrate molecules (such as enzyme substrates) crystallised in complex with proteins and (2) the sugars which are an integral part of glycoproteins. The software will be implemented in the ubiquitous 'Coot' graphical model building software, and will thus be made available to whole user community, both academic and commercial. The result will be a more reliable and more objective interpretation of sugars in 3D structures of macromolecules from living organisms, which in turn will enable greater understanding of the roles of these sugars in essential biological processes.

Impact Summary

The structures of mammalian proteins, which may be glycosylated, are increasingly used by the pharmaceutical industry. There is also pressure to ensure that therapeutic compounds adopt the correct glycoforms. Finally, the development of biofuels is driving interest in cellulose-digesting enzymes. As a result there is significant industrial interest in carbohydrates. Industrial users in all of these fields depend on accurate structural studies including carbohydrate chains, and some are involved in performing such studies. Consumers of the structural studies will draw more accurate conclusions if the source data is also more accurate. Companies performing such studies will see direct benefits from automation in the form of reduced labour and reduced error rates. This is of particular relevance to pharmaceutical applications, given that FDA regulations now require the extensive characterization of the glycoform profiles of therapeutic glycoproteins. The CCP4 suite already has significant commercial impact (~100 commercial licensees paying £9500pa). YSBL has played a significant role in this impact, with two YSBL-originated developments (the 'refmac' and 'coot' software) being the most-used tools in their field. We will actively engage this user base. This will be achieved as follows: CCP4 developers, including the York group, conduct an annual meeting with structural biologists at GSK to guide future developments. Plans and developments will be presented at this meeting. There are occasional visits to other customers. Users from two further commercial licensees responded positively to an initial message placed on the CCP4 bulletin board asking if there was a demand for carbohydrate building software. These users will be contacted before the grant begins to clarify their use cases, and site visits will be arranged to ensure that the software can address their needs. The full list of CCP4 commercial licensees will be contacted and offered the chance to provide input on the project. From the responses from all these groups, a small group of user champions will be identified. These will receive test versions of the software as it is developed to ensure user feedback. They will be invited to attend a progress and steering meeting around the middle of the grant. Once the software has achieved a basic level of usability, the developer will participate in CCP4 site visits to interested commercial users to demonstrate and train staff in the use of the software. This represents a greater level of engagement with commercial users than has been normal in previous CCP4 projects, which is appropriate given the commercial relevance of this project. In addition to these direct links with commercial users, the PDRA will attend CCP4 workshops to teach users how to apply the software to their problems. These workshops have proven to be vital in bringing together users and developers and ensuring that the software addresses real world problems, and most importantly will be aimed at both academic and commercial laboratories. The resulting software will be added to the COOT package (for the interactive tools) and to the CCP4 suite (for non-interactive tools). Both packages are in use world-wide and are available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the vast majority of macromolecular crystallographers. Both are updated with major version releases roughly every year. COOT provides nightly releases to ensure users can access the latest developments; the CCP4 suite is on track to do the same in the near future. As a result, once the software has been added to these packages it will within months be available to most of the user community.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file