Award details

The Dundee Resource for Protein Structure Prediction and Sequence Analysis

ReferenceBB/J019364/1
Principal Investigator / Supervisor Professor Geoffrey Barton
Co-Investigators /
Co-Supervisors
Institution University of Dundee
DepartmentSchool of Life Sciences
Funding typeResearch
Value (£) 633,871
StatusCompleted
TypeResearch Grant
Start date 01/03/2013
End date 31/08/2018
Duration66 months

Abstract

Although there are many challenges in managing large sequence datasets, the major hurdle is to use the raw sequence data to inform our knowledge and understanding of biological systems. An essential prerequisite to understanding are accurate and reliable software tools to make structural and functional predictions from the sequence data. In this proposal we will build robust support for secondary structure prediction server "JPred" which performs up to 95,000 predictions per month for scientists in 140 countries. We will also build support for the sequence analysis pipeline "TarO" and the protein kinase database and kinase classification pipeline, "Kinomer" by re-engineering all three servers into the new Dundee Resource for Protein Structure Prediction and Sequence Analysis. We will achieve this goal by wrapping components of all three web servers as "web services" in the new Java Bioinformatics Analysis Web Services (JABAWS) system for portable web services. JABAWS will allow programmatic access to the tools that are currently only available through a web page, and also make them accessible to the popular Jalview workbench for visualisation and analysis of protein sequences. JABAWS can also be deployed on a laptop or in the Amazon cloud computing service. Since annotations of complete proteomes are valuable in understanding the function of all proteins coded for by a genome, we will run all the tools developed here on complete proteomes and maintain the results as a database that can be accessed through a web site and DAS (Distributed Annotation Server) server. The users of the new integrated resource will be very diverse, from experimental biologists with little knowledge of computing, to bioinformaticians who write their own software. Accordingly, we will develop extensive manuals and e-learning materials to inform and educate potential users at all levels and we will run regular training courses.

Summary

This resource application is focused on supporting and maintaining computer tools and techniques developed at the University of Dundee that are in daily use by thousands of biological scientists throughout the UK and the world. The resource will not only ensure that these tools are readily available to all scientists, but also improve the ability of scientists and students to use them through better interfaces and via regular face-to-face training courses and other on-line materials. The tools focus on the analysis of protein sequences and structures which are briefly introduced here. The plans to make a plant, animal or micro-organism are encoded as the molecule DNA and known as its genome. The genome can be represented as a long word made up of four different letters (A, C, G, T). The genome may be a few thousand letters long for a virus, to several billion letters for plants and animals. The genome is divided up into regions called genes which are translated by complex molecular machines into other molecules such as proteins. Humans and other animals have 20-30,000 genes that code for proteins and each protein made up of a sequence of 20 different amino acid types joined together in a chain. Protein sequences from an organism vary in length from a few amino acids, to several thousand and can be represented as a word made up of 20 different letter types. The protein chain folds up into a complex three-dimensional shape that is defined primarily by its sequence. The shape of the protein, its "conformation", dictates the biological function of the protein, so understanding the conformation of a protein is vitally important to understanding the protein function. Over recent years there have been huge advances in technology to sequence DNA and so the genomes of many different organisms have been determined. As a consequence, the sequences of several million proteins are now known but less than 100,000 have had their detailed three-dimensional structures worked out. The computational tools that will make up this resource help to bridge this information gap by classifying protein sequences and making predictions of protein structure that can guide biologists to design more efficient and effective experiments. The main objectives of the proposal are to provide support, maintenance and training for the popular JPred protein structure prediction server which performs up to 95,000 predictions monthly for scientists in 140 countries, the TarO protein sequence analysis suite and Kinomer protein kinase classifier and database. The new resource will integrate these individual tools with a consistent look-and-feel and make them available to researchers and tool developers in new ways over and above a conventional web site. Web sites are good for humans to interact with, but less useful for computer software to interface to. Since our tools are useful for large analyses that might be done on many thousands of proteins, the new resource will also support a novel "web services" interface to the tools. Web services allow a program or application to be run remotely from within a program. For example, I might have a program running on my desktop computer, but call for an intensive calculation to be done on a remote high-performance computer system. We have recently developed a new method to deploy web services (called JABAWS) that makes installation of web services straightforward. A key part of the new resource will be to add more complex applications such as structure prediction to Jabaws and so make them readily available to programmers and end-users

Impact Summary

The Dundee Resource will support a set of tools that will be widely used by the international biological sciences community. This has impact to all areas of academic BBSRC research as well as MRC funded and other research councils that support research involving genome or protein sequences. Users of the Dundee Resource span academia across all biological subject areas and researchers in the pharmaceutical, agrochemical, agricultural and animal breeding industries where the analysis of protein sequences and their functional context is important to the economic success of the company. As such, the Dundee Resource will have both Economic and Societal impacts by speeding up the accuracy and depth of inference possible from sequence data and so increasing the competitiveness of its users in academia and industry. Improved competitiveness of the users of the resource across such a wide range of academic and industrial domains is likely to lead to improved competitiveness for the UK. The new ProteoCache will significantly accelerate the speed at which scientists can access key information about their proteome of choice and apply this in experimental design or interpretation. The Dundee Resource, particularly when coupled with the Jalview sequence analysis workbench, will also be important in teaching students in life sciences disciplines both basic and advanced sequence analysis. This educational role will enhance the knowledge and expertise of future generations of biologists and technologists working in academia and industry across all molecular life sciences disciplines in the UK. Further beneficiaries will be attendees at the annual training workshop that will be run to teach potential users both the scientific background to the methods in the Dundee Resource and the practical use of the tools on their specific problems. The training workshops will be open to graduate students, postdocs, academics and members of industry. For those who can't attend the workshops, the on-line e-learning materials will provide similar information backed by informal email support. The Dundee Resource for Protein Structure Prediction and Sequence Analysis is aimed at accelerating scientific discovery and maximising the benefit of investment in sequence data generation. However, when coupled with visualisations in Jalview some of the tools such as secondary structure and disorder prediction could be explained to schoolchildren and the general public. We have experience of public outreach through the annual "Doors Open Day" at Dundee and through the development of our GenomeScroller exhibit (www.genomescroller.org) that provides an exciting backdrop on which to explain the human genome, how big it is and how much (and how little) is understood about how it functions. In Year 2 of this grant and subsequent years, we will display and explain outputs of the new Dundee Resource alongside GenomeScroller in order to introduce a new audience to the power and excitement of bioinformatics research.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsStructural Biology
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file