Award details

CCP4 Advanced integrated approaches to macromolecular structure determination

ReferenceBB/S007105/1
Principal Investigator / Supervisor Professor Daniel Rigden
Co-Investigators /
Co-Supervisors
Institution University of Liverpool
DepartmentInstitute of Integrative Biology
Funding typeResearch
Value (£) 374,669
StatusCurrent
TypeResearch Grant
Start date 01/10/2019
End date 30/09/2023
Duration48 months

Abstract

This proposal incorporates four related work packages. In WP1 we will expand on our work using established and novel metrics of data quality and consistency to quantify the relationship between diffraction and map quality. The tools will be used to optimise approaches to structure determination from multiple or serial crystallography data to enable optimal selection of collected data and fully utilise all the information in structural refinement. WP1 will also develop and implement methods for electron diffraction data collection, integration and refinement. WP2 will utilise generalise the use shift field refinement and extend its usage to hybrid refinement approaches and develop new software libraries to enhance and speed up protein structure model building and refinement across a wide resolution range. In WP3 we will develop and implement the use of contact prediction methods for use in crystallography. It will help identify protein domain boundaries, define new search model approaches. The contact prediction approach will also be used to validate Molecular replacement solutions and assist in the interpretation of crystallographically derived protein:protein contacts. In WP4 we will develop a model for electron scatter from macromolecular samples to enable software development and experimental design. These models will be used to develop and implement new scaling algorithms for electron diffraction data within DIALS.

Summary

Proteins, DNA and RNA are the active machines of the cells which make up living organisms, and are collectively known as macromolecules. They carry out all of the functions that sustain life, from metabolism through replication to the exchange of information between a cell and its environment. They are coded for by a 'blueprint' in the form of the DNA sequence in the genome, which describes how to make them as linear strings of building blocks. In order to function, however, most macromolecules fold into a precise 3D structure, which in turn depends primarily on the sequence of building blocks from which they are made. Knowledge of the molecule's 3D structure allows us both to understand its function, and to design chemicals to interfere with it. Due to advances in molecular biology, a number of projects, including the Human Genome Project, have led to the determination of the complete DNA sequences of many organisms, from which we can now read the linear blueprints for many macromolecules. As yet, however, the 3D structure cannot be predicted from knowledge of the sequence alone. One way to "see" macromolecules, and so to determine their 3D structure, involves initially crystallising the molecule under investigation, and subsequently imaging it with suitable radiation. Macromolecules are too small to see with normal light, and so a different approach is required. With an optical microscope we cannot see objects which are smaller than the wavelength of light, roughly 1 millionth of a metre: Atoms are about 1000 times smaller than this. However X-rays have a wavelength about the same as the size of the atoms. For this reason, in order to resolve the atomic detail of macromolecular structure, we image them with X-rays rather than with visible light. The process of imaging the structures of macromolecules that have been crystallised is known as X-ray crystallography. X- ray crystallography is like using a microscope to magnify objects that are too small to be seen with visible light. Unfortunately X-ray crystallography is complicated because, unlike a microscope, there is no lens system for X-rays and so additional information and complex computation are required to reconstruct the final image. This information may come from known protein structures using the Molecular Replacement (MR) method, or from other sources including Electron Microscopy (EM). Once the structure is known, it is easier to pinpoint how macromolecules contribute to the living cellular machinery. Pharmaceutical research uses this as the basis for designing drugs to turn the molecules on or off when required. Drugs are designed to interact with the target molecule to either block or promote the chemical processes which they perform within the body. Other applications include protein engineering and carbohydrate engineering. The aim of this project is to improve the key computational tools needed to extract a 3D structure from X-ray and electron diffraction experiments. It will provide continuing support to a Collaborative Computing Project (CCP4 first established in 1979), which has become one of the leading sources of software for this task. The project will help efficient and effective use to be made of the synchrotrons that make the X-rays that are used in most crystallographic experiments but also extend to use of electron microscopes which have gained much recent publicity with the Nobel prize being awarded to researchers from this field. It will provide more powerful tools to allow users to exploit information from known protein structures when the match to the unknown structure is very poor. Finally, it will allow structures to be solved, even when poor quality and very small crystals are obtained.

Impact Summary

The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section. Contact predictions obtained from evolutionary covariance analysis are available for an ever-increasing number of proteins and protein families. Although predictive algorithms continue to improve, current predictions are widely available and good enough to impact on the efficiency of protein X-ray crystallographic structure determination. WP3 is concerned with facilitating the use of contact predictions throughout the structure determination pipeline. It encompasses four different elements, presented here in pipeline order. In structural biology proteins are often expressed heterologously with regions that hinder structural studies removed and/or particular sub-sequences expressed in isolation. Accurate identification of the domain boundaries is therefore essential. Since domains typically represent self-contained folding units, the contact prediction information provides a direct evolutionary readout of residues that are within a given domain. The improved domain boundary identification methods we will develop here will avoid the wasted effort associated with incorrect estimates. The second element of WP3 concerns novel approaches to Molecular Replacement (MR), the predominant mode of crystallographic structure solution. Predicted contact maps will be used to select from libraries of mid-sized search models, derived in a number of ways, extending MR to more difficult cases such as novel or divergent folds. The impact will be the computational solution of otherwise intractable targets by MR. The third element addresses tracing of sequence into electron density and its validation. We will enable flexible visualisation of contact prediction information in Coot and CCP4MG, facilitating its use to help interpret density in regions poorly defined by electron density, but also providing for representation of predictions in figures for publication. Together with incorporation of contact information into the automatic tracing software BUCCANEER and enhancements to ConKit, this latter to provide a contact-based validation method, these changes will leverage contact predictions at the crystallographic coalface in order for faster and more accurate interpretation of density. Finally, WP3 will see improvements to the CCP4 tool PISA to use the direct evolutionary readout of biological significant provided by contact predictions in order to better distinguish physiologically important protein interfaces from mere lattice contacts. The impact here will be less ambiguity about the biological meaning of crystal structures. The software developed in WP3 will be added to the CCP4 suite. The CCP4 suite is used world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to macromolecular crystallographers. The automated update mechanism of CCP4 will enable fast access to new developments by the user community. Although the focus in WP3 is on structural bioinformatics for crystallographic ends, some elements are relevant to other communities. With medium- to low-resolution cryo-EM structures, it is commonly found that density does not allow for unambiguous tracing of sequence register throughout. By integrating contact prediction into software shared between the crystallographic and cryo-EM communities - Coot, CCP4MG, Buccaneer and ConKit - contact-based assistance with and validation of sequence register will be available to both. The impact will be in a reduction of registry errors and improved confidence in the accuracy of regions defined less well by electron density. Finally, accurate domain boundary information facilitates bioinformatics tasks such as distant homology detection and ab initio structural modelling. Thus, the domain parsing improvements here will benefit the structural bioinformatics community.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file