BBSRC Portfolio Analyser
Award details
Computational prediction and analysis of long non-coding RNAs
Reference
BB/J01589X/1
Principal Investigator / Supervisor
Dr Anton Enright
Co-Investigators /
Co-Supervisors
Dr Matthew Davis
Institution
EMBL - European Bioinformatics Institute
Department
Enright Group
Funding type
Research
Value (£)
357,790
Status
Completed
Type
Research Grant
Start date
01/07/2012
End date
30/06/2014
Duration
24 months
Abstract
Recent advances in genome sequencing and high-throughput functional genomics have shown that the genome is pervasively transcribed. In particular non-coding RNA has recently come into the limelight as providing a platform for novel layers of gene-regulation that have been largely overlooked. Work on microRNAs and piwi-RNAs in particular has shown how the expression of large numbers of molecules (mRNAs and transposons) can be targeted and regulated by very short RNA molecules via a complex system of RNA binding proteins and other molecules. This proposal focuses on long non-coding RNA (lncRNA) which are >200nt and lack a functional open reading frame. While a number of these molecules have been studied over the years, it has only been relatively recently that high-throughput sequencing and expression analysis has shown how many non protein-coding transcripts are being actively transcribed. Many functions have been proposed for these molecules, including antisense-regulation and the blocking of regulatory regions. We propose to develop a computational system for the detection, characterisation and functional analysis of these molecules from next-generation sequencing data. This system will process sequence reads from RNAseq experiments, clean and filter reads and assemble overlapping reads into likely lncRNA transcripts. These candidate molecules will be categorised according to their genomic context and we will attempt to detect cases where they may regulate other transcripts via antisense binding. The computational part of the proposal aims to produce a detailed computational pipeline and web resource for analysis of lncRNAs. The experimental part of the proposal aims to validate these candidates, obtain phenotypic information from knockouts and identify bound protein complexes which may mediate their function. We will assess the developmental profiles of lncRNAs across timecourses from Drosophila embryonic development and Mouse erythroid development.
Summary
The sequencing of the Human genome has created a new era in biological research. Understanding our genome and how it is regulated is one of the great challenges for science, yet has the potential to help improve lives and our ability to treat diseases. The advent of this genomic age has heralded rapid changes in the field of biology. One surprise from the initial sequencing of the genome was the relative scarcity of genomic regions which can be read to produce proteins via RNA intermediates. Proteins are the building blocks of cells and many important molecular machines are composed of proteins. The non protein-coding part of the genome was previously dismissed in some circles as largely containing 'junk dna'. In the last ten years a number of breakthroughs in genome analysis and genome sequencing have shed-light on many hitherto unknown aspects of biology being carried out by these non protein coding regions. Novel technologies such as genome tiling arrays and high-throughput RNA sequencing has shown that although large portions of the genome may not be coding for protein sequences, they are still being read as RNA messages. The discovery of small RNA molecules such as small-interfering and microRNAs illustrated that many of these non-coding messages were being processed within cells and used to regulate other genes (both protein coding and non-coding). Within testes and oocytes (germline) another class of small RNAs called piwi-RNAs was discovered and shown to have an important role in protecting the genome as it passes from one generation to the next. Recently, attention is focusing on larger non-coding transcripts called long non-coding RNAs (lncRNAs). We know that the genome encodes many long RNA molecules which do not appear to encode proteins. A central dogma of biology has always been that DNA is read into RNA messages which subsequently encode proteins. This elegant view of molecular biology is still largely true, but the last ten years of research have revealed many hidden layers to this view of gene-regulation at the level of both DNA and RNA. Discovering how different classes of molecules work together is vital to our understanding of how our genome is regulated, how cells and organisms function and has tremendous implications for our understanding of development and disease. In this proposal we aim to build a computational system that will be able to detect candidate lncRNAs from RNA sequence data obtained from experimental samples. We aim to collect, score and characterise these molecules and to present them in a web-interface for further analysis. We will use computational biology to attempt to find cases where these molecules may interact with each other, protein-coding genes or the genome itself to control gene-regulation. Using computers allows us to work with a large quantity of data quickly and efficiently, however experiments are required in a laboratory to confirm and expand these results. We will work with a Mouse laboratory and a fruitfly laboratory (Drosophila melanogaster) to confirm our findings and to test the importance of these molecules by knocking them out. We will study what happens to these molecules as the embryo develops and as red-blood cells develop to see how their spatial and temporal expression is regulated. We will also attempt to discover what other molecules (such as proteins) may be binding to them. We believe that this project has the potential to greatly increase our understanding of these elusive molecules, the organisation of the genome and to assist ourselves and others in elucidating their roles in biology, health and disease.
Impact Summary
The main impact of this research will be to broaden our knowledge of the genome and its regulatory mechanisms. Our beneficiaries will include biologists, clinicians and scientists working in industry. The development of computational tools and resources has the potential to increase their productivity by making their own scientific analysis quicker, easier and more reliable. This proposal requires very specific skills and will contribute to the knowledge economy of the United Kingdom through the training of researchers and students in our laboratory and through training courses that we run including EMBO and Wellcome Trust Advanced courses. The European Bioinformatics Institute outreach team, promotes our science and research to the general public, school children and university students. We participate in school visits and open-days in Cambridge and at our campus. Through events such as these we aim to interest children and adults alike in science and its benefits. It is possible that a breakthrough made during this research would be patentable or possible to commercialise. We have a technology transfer office at EMBL Heidelberg who are in a position to assess these situations and make recommendations where appropriate. We will explore commercialisation opportunities if they arise during this research and do not impact on our open-source, open-data policies described in the cast for support. Long non-coding RNAs have already been shown to play roles in disease. We have a number of collaborations with clinical groups studying heart-disease, obesity and cancer. Any finding obtained during this project of medical relevance will be assessed and discussed with clinical groups at Addenbrookes Hospital (University of Cambridge) and other clinical institutions. We also have links to industry and pharmaceutical companies and such research may be of real benefit to their own ongoing research programs. It is likely that any therapeutic or diagnostic outcomes of research such as this would be of great benefit to society at large.
Committee
Research Committee C (Genes, development and STEM approaches to biology)
Research Topics
Technology and Methods Development
Research Priority
Technology Development for the Biosciences
Research Initiative
X - not in an Initiative
Funding Scheme
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list
new search