Award details

Evolutionary Analysis of Non-Coding RNA Genes and Gene Families

ReferenceBB/D012139/1
Principal Investigator / Supervisor Professor Jotun Hein
Co-Investigators /
Co-Supervisors
Dr Rune Lyngsoe
Institution University of Oxford
DepartmentStatistics
Funding typeResearch
Value (£) 201,999
StatusCompleted
TypeResearch Grant
Start date 01/01/2006
End date 30/06/2009
Duration42 months

Abstract

With the recognition of an ever widening range of functional, structural and regulatory roles, non-coding RNA genes are receiving increased interest. This in particular in the context of extracting information from genome sequencing projects. By far the most promising approach toward identifying novel non-coding RNA genes is by comparative methods. A better understanding of the constraints governing the evolution of non-coding RNA genes will be a tremendous assistance in improving comparative methods. Another interesting question of the evolution of non-coding RNA, relating to the origin of life, is the origin of present day non-coding RNAs. Proteins are translated from messenger RNA in a codon-wise manner, which normally leads to a three base periodicity in the evolutionary constraints working on protein coding DNA evolution. This three base periodicity is absent in non-coding RNA evolution. Instead, much more complex constraints arise from the necessity of preserving the base pairings in the functional structure. We will study how this constrains the evolution of non-coding RNA and how these constraints can assist in recognising conserved structure. This project will explore improvements to RNA sequence evolution. Such a modelling needs to account for evolution of the sequence - i.e. mutations, insertions and deletions of bases, and their affects on structure - as well as evolution of the structural role of individual bases. One consideration in the model development will be the feasibility of computationally making inferences under the model. Inferring simultaneous evolution of sequence and structure is a complex problem, yet we need to be able to do this efficiently to be able to make genome scale analysis of evolution of non-coding RNAs. Hence, an important task will be the development of efficient methods for comparative analysis of non-coding RNAs. We will also explore approaches using much simpler summary statistics, similar to the Ka/Ks ratio of protein coding DNA evolution, for representing the constraints of non-coding RNA gene evolution. This will lead to a better understanding of the evolutionary behaviour of structural elements. To explore the long term evolution of non-coding RNA gene families we will develop tools for detecting distant homologies. These tools will be based on probabilistic representations of families of homologous non-coding RNAs with a shared structure, known as covariation models, and methods for comparing these models. Finally, we will investigate possible improvements to the prediction of RNA secondary structure, both in terms of power and efficiency. This will be useful in its own right, but more importantly we will be able to gain experiences that can be transferred to the comparative approaches outlined above. One outcome of this project will be several novel and improved software tools for analysis of RNA sequences and genomes. These will be made accessible and freely available.

Summary

Recent years has seen an increased appreciation of the importance of some parts of the genome that is transcribed from DNA to RNA but not translated from RNA to proteins, also known as non-coding RNA. Though probably not utilised in the same abundance as proteins, RNA molecules are now known to undertake core cellular functions like catalyse biological processes, regulate the production of other genes, and provide structural support. Due to the absence of the translation step and the fundamentally different principles governing structure formation for RNA molecules and proteins, the way non-coding RNA and protein-coding RNA evolves is quite different. The codon based encoding of proteins results in a three base periodicity of selective constraints on protein-coding RNA. Non-coding RNA is not translated, and thus not subject to codon caused constraints. However, the structures of RNA molecules have very strong base pairing interactions, similar to that seen in the Watson-Crick base pairing of the DNA double helix. Preservation of this base pairing results in structure dependent constraints on the evolution of non-coding RNA genes. We will investigate the evolution of non-coding RNA genes, in particular how local structure affects rates of mutation, and attempt to develop improved models for describing this type of evolution. A direct benefit of an improved understanding of RNA evolution will be better methods for recognising when the evolution separating two related sequences seems to have been governed by the types of constraints particular to non-coding RNA genes. This will assist in identifying non-coding RNA genes shared between related genomes, and hence in our ability to make sense of the genome sequences that become available. Due to the versatility of RNA in mediating biological processes, a replicability similar to DNA that is not possessed by proteins, and the presence of RNA in some of the most fundamental cellular functions, it has been hypothesisedthat life may have started with RNA or RNA-like molecules, the so-called RNA world. To what extent are modern non-coding RNA remnants from the RNA world, rather than of more recent origin? A few known families of non-coding RNA genes have been identified throughout most types of living organisms, but many families seems to be quite close knit lacking distant relatives. This could indicate a more recent origin, but could also be due to our lack of ability to recognise distant relatives as such (or even failure to identify distant relatives as non-coding RNA genes). We intend to address this problem by developing a novel method for recognising distant relatives by comparing representations describing the variability over the entire family of a non-coding RNA gene. As stated above, by far the most important source of constraints on the evolution of non-coding RNA genes is believed to stem from conservation of base pairing in the molecular structure of the RNA. It is thus of great importance for this project to be able to infer the base pairs of the functional structure, also known as the secondary structure, with high fidelity. As a stepping stone towards the main goals of this proposal we will thus also look into possible ways to improve secondary structure inference. Software tools from both this and other parts of the project will be made freely available for other researchers to use, either as source code or as web servers.
Committee Closed Committee - Genes & Developmental Biology (GDB)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file