BBSRC Portfolio Analyser
Award details
Delivering accurate structural bioinformatics to the yeast community with the HHprY database
Reference
BB/M011801/1
Principal Investigator / Supervisor
Dr Timothy Levine
Co-Investigators /
Co-Supervisors
Institution
University College London
Department
Institute of Ophthalmology
Funding type
Research
Value (£)
69,624
Status
Completed
Type
Research Grant
Start date
01/05/2015
End date
30/04/2018
Duration
36 months
Abstract
The eukaryotic model system that is understood in the most detail is budding yeast. However, the value of systems wide "omics" experiments is limited by the lack of information on the likely function of ~1/6 of the yeast proteome, as 1000 proteins have no discernible homology that points to possible function. Many other proteins that do have homologs elsewhere appear to be lacking key domains, so it is not clear if the yeast protein is an ortholog. Current approaches to detect yeast proteins all involve iterating to make sequence profiles and searching for significant matches in yeast. We have found two ways of enhancing detection of homologous domains in yeast. The first is to initiate searches with yeast sequences. Reversing the direction of search adds information because iterative searches are non-commutative. The second enhancement is to change from profile-sequence to profile-profile searches, which are known to be more sensitive. Our pilot work showed that these two advances will likely add ~1000 new domains to the yeast proteome, and will reduce the proteins with no functional homologue from ~1/6 to 1/10. In this project we will create the first ever proteome-wide profile-profile map of all domains in yeast. We have already accumulated all the data for the map in a database of >100,000 searches, so in the project the first main task will be to parse this database for matches to make a draft of where the matches are. The second task will be to create readily interpretable diagrams and linked descriptions of each match. This will include developing explicit rules on including matches of borderline significance. The third task will be dissemination of results to maximise access to our data. The final task be automating the pathway for maintenance and upgrade. The new resource will benefit not only genomic work in academia and pharma, but also individual researchers working on the yeast proteins we annotate or their homologs in fungi, crops and humans.
Summary
The understanding of cells has increased with new technology that has developed from genome sequencing. Experiments are run by robots to produce huge sets of results. As a result, our understanding of living cells is now so detailed that we can easily imagine a future where an entire organism is understood at the molecular level. The most likely candidate to be this organism is baker's (or brewer's) yeast, which was the pioneer cell type for many revolutionary experiments, including the first to have its genome sequenced. Because of the surprising degree of similarity at the molecular level between yeast and man, ground-breaking discoveries in yeast often reveal much about equivalent events in human cells. Proteins are the major players that do things inside cells. So one way to understand any organism is to classify what its proteins do. In some cases pure proteins can be studied, but this is too challenging to do for every protein, and so another way to classify proteins is needed. Using genome sequences, we can very easily determine the sequence of the proteins coded by the genes. We can then look at the sequence of each protein in turn to find out if it is similar to a protein whose function we already know. Proteins whose sequences are similar, even if one is in yeast and another in human, are then said to be in a single protein family. As the families get bigger a new phenomenon occurs from looking at all the sequences together: we often find subtle patterns that the proteins share. The patterns are very useful, because often we can use the patterns to find even more sequences, slightly more distantly related but still in the family. This approach is the one that has been applied universally to all new genomes and it helps identify what many of the proteins are doing. But it is far from universally successful. For yeast proteins there is a problem of perspective. The place where we typically start looking at a protein family is in humans. However, there arevery many sequenced genomes for other animals, particularly vertebrates. So the patterns we find are very strongly biased to the vertebrate members, and sometimes the similarity shown by the yeast family member is too vague to be noticed. A second problem is that the whole approach of using a family to find a new member is that it has now been rendered out of date. A new approach is to work out for a new protein what proteins are in its close-knit family among other closely related species, and to use this family to find the pattern of shared sequence. Then, instead of using the pattern to find another sequence, the pattern is compared only to other patterns. Because each pattern holds within it much more information than one sequence can, this see far more subtle similarities, so it ends up identifying more ditant relationships that we could not see before. We suspected that comparing patterns would increase what is currently known about the relationships between yeast proteins and proteins in other well understood organisms, including humans. In a sample of 130 proteins (2% of yeast's total) we found over 20 new relationships for at least part of the protein - one new piece of information for every six proteins. This ratio rose to one in three for proteins where no family relationship had been known previously. Finding these new relationships is a considerable step towards the complete mapping of this model organism. We will now carry out our analysis for the whole yeast genome and create a web resource for yeast researchers to freely access. No genome-wide analysis of patterns has been done before. The patterns will be made and compared by computers, with minimal input from the research team. A major part of the project will raising awareness of our results by linking them to the most prominent web resource used by yeast
Impact Summary
Industry There is huge interest in specific organisms and biological pathways in industry, either to use (micro)organisms to make specific biological molecules or to study larger organisms with economic value, particularly in the food chain. The approaches open to academia are also used in industry, so our work will also have a big impact there. Micro-organisms as factories and biologicals including Biofuels Yeast itself is being actively investigated as a bioreactor for production of high value biologicals such as edible polyunsaturated, oils industrial lubricants, diesel fuel and drugs. High throughput genetic approaches are used to identify blockages preventing better expression of the desired pathway in the yeast cell. However, in all these cases genes of unknown function are a barrier to understanding the data. With 15% of hits in this category, a key portion of results can only be framed in terms of other similar experiments. Being able to factor in data from a different dimension, the structural homology to protein families with a range of likely functions, may make a data set much more interpretable. Other microorganisms in particular other fungi share proteins with yeast, so our findings will be particularly helpful there Crops (and fungi again) Plants and yeast are only distantly related, yet there are still domains of unknown function that they share, so advances in yeast will inform advances in crop biology and hence lead to economic benefit. An example of this is the family of SRPBCCs that we have discovered using HHsuite, which has multiple as yet undiscovered members not only in fungi and man, but also plants. The project as a whole will reveal links that advance understanding of how economically important plants deal with stresses including infestations and drought, and so further benefit the agricultural community. Increased food security will come not only by increasing botanical knowledge, but also through increased understanding of fungi,which cause >$200Bn loss / year world-wide, and would be even greater if not for repeated antifungal treatment. In the UK alone fungicides prevent £200M/y wheat wastage by septoria leaf blotch. However, resistance is increasing, and to ensure food security we must develop new antifungal strategies. Therefore, beneficiaries will include agriculture overall. Human heath and well-being Better identification of structural homologies in yeast will also have implications for human biology. If HHprY works well enough to be a proof of principle , it could be applied humans, where many proteins of completely unknown function (e.g. c9orf72) are being linked to disease by modern genetics, where disease loci are easily tracked by sequencing. Industrial Structural Bioinformatics Companies need to understand their own, highly valuable big datasets, and so they too need to eliminate problems of proteins of unknown function. Therefore work that we are carrying out here will be a proof of principle for similar projects in industry, particularly where companies have invested in technology to carry out genome-wide experiments. Taking just the single area of biofuels, the need to work in microorganisms with large numbers of proteins of unknown function is severe. The best models are specialised fungi (e.g. Yarrowia, which is more adapted to make diesel than budding yeast) and various algae, which have ecological advantages for the development of biofuels. All these species have many proteins of unknown function, so an HHpr-Alg or HHpr-yarrow would be an attractive proposition. * * * * * Impact on society The eventual understanding of a single cell in all its detail is an achievement that science can only dream of at present. The impact of reaching this goal at some stage in the future will probably be even bigger for society than for science. This cell is likely to be budding yeast, and the work of this project will contribute to that.
Committee
Research Committee C (Genes, development and STEM approaches to biology)
Research Topics
Microbiology, Structural Biology
Research Priority
X – Research Priority information not available
Research Initiative
Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding Scheme
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list
new search