Award details

Further development of the PSIPRED server into an integrated tool for systems biology and functional genomics researchers

ReferenceBB/I026014/1
Principal Investigator / Supervisor Professor David Jones
Co-Investigators /
Co-Supervisors
Institution University College London
DepartmentComputer Science
Funding typeResearch
Value (£) 302,892
StatusCompleted
TypeResearch Grant
Start date 13/09/2011
End date 12/09/2014
Duration36 months

Abstract

The Jones Group at University College London has been maintaining a suite of web-based tools based on a number of cutting edge protein structure prediction methods since 1999. The methods allow users to predict a variety of protein structural features, including secondary structure and natively disordered regions, protein domain boundaries and 3D models of tertiary structure. More recently we have been developing new services to assist users in prediction gene function and protein-protein interactions - all of which we believe are vital developments to make PSIPRED more useful to systems biologists. The current web servers employ a number of features to help users become familiar with the software e.g. online tutorials and common look and feel. However, we have until now stopped short of fully integrating the suite of tools - this will be addressed in the proposed project. These developments would result in the only single server worldwide which provides all of the following prediction services to biologists: comparative modelling, fold recognition, ab initio (new fold) prediction, transmembrane protein structure prediction, disorder prediction, domain boundary prediction, binding hotspot prediction, ligand binding site prediction, and several novel approaches to gene function prediction. In addition to maintaining and improving the usability of the PSIPRED services, we also plan to add important new functionality. The main area we wish to address is dealing with high throughput sequencing data efficiently -providing users with functional and structural information relating to sequence variations in large data sets. In addition we will develop new approaches to predicting ligand-binding sites and new transmembrane prediction tools.

Summary

The completion of the first draft of the human genome in 2001, after years of effort, was heralded as a major breakthrough that would finally enable researchers throughout the world to answer intriguing and elusive questions relating to the mechanism that govern complex biological processes. Now the genome of a human can be sequenced in a matter of weeks and we will soon have the complete genomes of many thousands of different organisms. The hope is that the information generated from this explosion of genome data worldwide will be harnessed to further our understanding and applied to beneficial and therapeutic use through computer aided biological research. Most genes are designed to code for specific proteins which have useful functions in the body. Proteins are essentially strings of simpler molecules, called amino acids and these strings can self-assemble into a complex 3-D structure as soon as the protein is formed by the protein-making machinery (ribosomes) in the cell. It is this unique structure which determines the precise chemical function of the protein (i.e. what is does in the cell and how it does it). By firing X-rays at crystallised proteins, scientists can determine their structure, but this process can take many months or even years. With hundreds of thousands of proteins for which the native structure is unknown, it is not surprising that scientists want to find a clever shortcut to working out the structure of proteins. We, like many other scientists have been trying to 'crack the code' of protein structure i.e. working out the rules which govern how the protein finds its unique structure and then trying to program a computer with these rules to allow scientists to quickly 'predict' what the structure of their protein of interest might be. The PSIPRED service is a collection of Web servers maintained at UCL which does just this - it allows biologists to predict protein structure from amino acid sequence. Over the years it has helped many thousands of scientists with their work by providing these services and we now wish not only to upgrade and maintain these existing servers but also to implement new methods which allow the structures of even the most difficult proteins to be deduced by computer simulations. More recently, for example, we have been building upon the original PSIPRED service to cover other important problems in biology. Probably the biggest of these problems is the prediction of biological function of sequenced genes. Relationships between protein structure and function have been well documented over the last 30 years, however the diversity and complexity presented by nature poses several challenging problems. Gene products from different species may exhibit the matching biological functions, but may show little or no sequence similarity, perhaps due to convergent evolution. It may be that although there is little overall structural and sequence similarity between two proteins that key properties of the active sites (e.g. overall charge or approximate shape) are conserved allowing similar functions to be carried out. Analyses of functional regions within protein structures on a large scale will not only allow the development of more reliable genome annotation tools but also enhance the knowledge base of the biological role of proteins at a cellular level. Such understanding will be a key stepping stone in the development of techniques and pharmaceuticals to target diseased genes and their products as well as proteins from pathological organisms.

Impact Summary

SUMMARY OF RESOURCE This proposal is to maintain and further develop a set of Web-accessible tools and services that has been developed at UCL (the PSIPRED server portal). This portal provides a wide variety of tools to the general biomedical research community, and is available for use to both academic and commercial researchers. In many independent tests, these tools have proven to be amongst the very best worldwide, and are even used by other resources around the world as part of their own pipelines and workflows. IMPACT OVERVIEW The PSIPRED portal was used a total of 183,000 times in the last year, and had nearly 85,000 unique visitors. Users are spread across the globe, with 22% of users coming from the US and 21% of users from the UK. This testifies to the importance of this resource, particularly to the UK bioscience community. Users typically also come from a wide variety of scientific research areas. Based on our user support enquiries and user surveys, we can identify users in areas across the whole BBSRC remit e.g. bio-energy, ageing research, biotechnology, synthetic biology, vaccine design, plant biology, animal health and even nanotechnology. In summary, the immediate beneficiaries of this research are the broad community of experimental biologists needing additional functional or structural clues for proteins of interest. Both academic and industry scientists will benefit in a similar way as the results of this research will be available freely to all users. Commercial scientists with sensitive data will be able to license the software through UCL Business so that they can exploit the resource without revealing their research interests to other users. Being able to determine even some clue as to the structure or function of uncharacterised proteins can have significant impact in the broad variety of areas mentioned above. Beyond industrial applications of this research, filling in the major gaps in our knowledge of what the full complement of genes and the products of these genes do and how the proteins interact can have wider implications in understanding the working of healthy cells and how they age. Ultimately this work can make a contribution to our overall understanding of how life processes arise from interactions between a relatively small number of genes in our genomes and the genomes of other organisms. We also note that many users of our servers use the resources for teaching purposes. It's clearly vital that for maximum impact, the next generations of graduates and postgraduates in the biosciences be trained in advanced computational biology techniques. We are therefore pleased that our tools, because of our focus on good quality visual output and speed of returning jobs, find use in teaching laboratories around the world.
Committee Research Committee D (Molecules, cells and industrial biotechnology)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file