Award details

Advancing capability in high performance protein structure and function prediction through optimisation of IntFOLD

ReferenceBB/T018496/1
Principal Investigator / Supervisor Professor Liam McGuffin
Co-Investigators /
Co-Supervisors
Institution University of Reading
DepartmentSch of Biological Sciences
Funding typeResearch
Value (£) 731,921
StatusCurrent
TypeResearch Grant
Start date 02/11/2020
End date 01/05/2026
Duration66 months

Abstract

Protein structure prediction servers are a practical way to bridge the sequence-structure gap. For most known protein sequences these tools can be used to build accurate 3D models, which helps us to understand protein functions. However, the wider acceptance of 3D models of proteins is dependent on availability of free and easy to use servers that integrate accurate methods for estimating the confidence in model quality. IntFOLD is a high performance publicly available web server for the prediction of protein structures and functions, with world class model quality estimates. The IntFOLD resource is unique in offering a fully integrated tool set for protein fold recognition, 3D model quality estimates, disorder prediction, domain prediction and the prediction of protein ligand binding sites. The IntFOLD server predictions have been independent evaluated in numerous blind CASP experiments and in the continuous CAMEO project, where the component methods have ranked top in several categories. The server is world leading in providing template based models with built-in accuracy self estimates, which allows users to accurately gauge the confidence of their predictions. The models and predictions from IntFOLD have been applied to a diverse range of specialisations across the life sciences, impacting on agriculture, food, biotechnology and health. In this project we aim to effect major advances in IntFOLD capacity and utility to expand our user base more sustainably. The server has grown in popularity both in the UK and globally, due to its high performance, ease of use and relevance of the models it produces. High demand on our resources has created a bottleneck and we need investment in dedicated staff and resources to deliver a step change improvement in user needs and experience. We will develop and integrate more intensive high performance methods, exploiting timely advances in contact prediction and deep learning in order to maintain our international lead.

Summary

One of the major challenges in biology is to understand how proteins fold up into the different shapes that are specified by their sequences of amino acid building blocks. If we know how proteins fold then we can understand what they do and how they work together as the fundamental molecular machines in all living systems. Our research aims to improve our ability to understand protein structures and how they function. This information can be used to help us tackle a wide range of urgent problems, such as, securing future food supplies, producing new medicines and sources of energy, and ensuring healthier people, plants and animals. Proteins are the most important components of every single living cell and they come in thousands of different shapes and sizes. Genes contain the code for making the many different protein molecules. We have very efficient machines for analysing genes and collecting genetic sequence code. We have already collected the genetic sequences for thousands of living things, from bacteria to plants and animals, but there are still many more to investigate. The amount of available genetic information is increasing at an ever faster rate and we are making strides to decode this information to understand what the encoded proteins do. There are several different types of experiments that we can do to find out the shapes or structures of proteins. Unfortunately, doing an experiment to find out the structure of just one protein can take many years and it can be very expensive. This means that we now have large knowledge gaps with missing information about what proteins look like and how they work together. In order to make full use of the genetic information that we are collecting, we need to be able to close these gaps in our knowledge and complete the puzzle. Fortunately, we have developed our computer software system, called IntFOLD, to model the structures of proteins, which is many times faster and cheaper than physical experiments. The IntFOLD software makes use of our existing knowledge of protein sequences and structures to help fill in the missing information about new sequences. By learning from what we already know, the software can make predictions about the shapes of the new proteins. We can then build virtual models of the molecules and see where all of the atoms are likely to be in three dimensions. We can then better understand how the molecules combine together to form biological machines. This transformative project is about the major enhancement of our IntFOLD software, making it even more useful and promoting it to more biologists in the UK and around the world. The software has already been used hundreds of thousands of times by thousands of researchers worldwide. The models produced by IntFOLD have helped new research into molecular mechanisms, diseases and the evolution of proteins across all kingdoms of life. We now need to improve our IntFOLD software to make the models more precise, which will improve their usefulness further. We also need to include more predictions about how proteins assemble, which will improve our understanding of their functions. To effect this step change, we will need to employ a dedicated post doctoral researcher to assist in the development of the new IntFOLD, as well as to provide its availability to researchers worldwide. Computer speed and capacity is of the essence to keep up with the growth in demand, so we are also requesting funding to keep our hardware up to date.

Impact Summary

The main non-academic beneficiaries of IntFOLD include: agricultural and food industries, clinicians and veterinarians, the pharmaceutical and biotech industries, students studying life sciences, as well as university applicants and the general public. This project helps us to realise the potential of next generation sequencing, providing world class tools for underpinning research impacting on the BBSRC priority areas of agriculture and food security, bioscience for health and industrial biotechnology. The IntFOLD server allows us to better exploit the wealth of genomic data available by improving our understanding of the structures and functions of the proteins that are encoded by genes. The biotech and pharmaceutical industries will benefit from access to improved structural and functional bioinformatics tools aimed at exploiting protein sequence data. Improved 3D models of proteins can help to improve the drug design and development process, for example, as well as impacting on many other industrial processes relevant to food security, bioenergy and health. Ultimately, progress in 3D modelling with IntFOLD will help to enhance UK scientific output, which will translate into improved clinical and veterinary care, and increased food security, resulting in increased health and economic competitiveness. Lecturers, postgraduates and undergraduates will be provided with enhanced tools for teaching and learning about the predicted structures and functions of proteins. The outputs from the software can be represented in a visually striking and intuitive way and so they will be used in outreach events, e.g. open days, visit days and public lectures, to demonstrate the uses of theoretical models in the solution of real world practical problems to the general public. This will enhance public understanding of the use of computational tools in modern data driven biology. The following quotes are from letters of support for IntFOLD, which highlight the breadth of impactacross sectors that fit with BBSRC strategic priorities: Agriculture and food security: "These results have underpinned our work into enlightening stress-associated proteins and transcription factors that play roles in transformation mechanisms in crop plants." - Prof. F. Gruel, Istanbul University. "The discovery of the importance of RNase-like effectors, which turn out to be the major determinants of avirulence in wheat and barley disease resistance genes is likely to continue to influence advances in this area." - Prof. P. Spanu, Imperial College London. Bioscience for health: "I have been using this powerful platform to inform our mutagenesis studies of VZV [the causative agent of chickenpox and shingles] glycoproteins as part of our structure-function studies related to pathogenesis." - Dr S. Oliver, Stanford University School of Medicine. "...the in silico analysis demonstrated distortions in the structure of the mutated enzyme or in its stability, causing the protein to malfunction, and resulting in a lethal pathology in newborn lambs." - Prof. L. Monteagudo, University of Zaragoza Biotechnology: "MBio is currently involved in research in human and animal food nutrition, fungal biotechnology and horticulture...Notably the IntFOLD server allowed us to observe a deepening in the catalytic cleft of one of the chitinases accounting for a difference in substrate-specificity pattern of that enzyme..." - Dr K Dwyer, Senior Scientist, MBio. Monaghan Biosciences. Students studying life sciences: "I also teach an undergraduate and graduate bioinformatics course where I use ModFOLD as an example to teach the students how to discriminate the models that they build. Another tool from your lab that we have recently been using is IntFOLD and it has been especially useful in analyzing the novel sequences we are studying." - Dr S. M Singh, City University New York.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file