Award details

RNA proposal title - The RNAcentral database of non-coding RNAs

ReferenceBB/J019321/1
Principal Investigator / Supervisor Dr Paul Kersey
Co-Investigators /
Co-Supervisors
Dr Guy Cochrane
Institution EMBL - European Bioinformatics Institute
DepartmentEnsembl Genomes
Funding typeResearch
Value (£) 610,959
StatusCompleted
TypeResearch Grant
Start date 01/10/2012
End date 31/03/2016
Duration42 months

Abstract

We will create a federated database and associated web portal, RNAcentral, to accession, store and represent non-coding RNA sequence data. A database repository (using the Oracle Relational Database Management System) will be constructed as an extension to the European Nucleotide Archive, and new tools developed to facilitate the submission of RNA sequence. In addition to direct submission, the repository will also be populated through the development of import pipelines based on agreed standards for data representation with expert databases who have agreed to support the project (initially gtRNAdb, HGNC, lncRNAdb, miRBase, Modomics, piRNAbank, Pombase, Refseq, Rfam, the Ribosomal Database Project, RNAdb, sRNAmap, SRPDB, tmRDB, the tmRNA website and VEGA). A web portal will be developed (using the Drupal open-source content management system) providing access to the submitted and imported sequences, and providing links out to the expert resources' own sites. A data warehouse (using a common biological data warehousing tool such as BioMart or InterMine) will also be developed, and bulk downloads of sequence sets will be provided. In the second period of the project, we will develop further pipelines to identify redundancy among submissions, assign submitted sequences to defined families, and (with the aid of prediction tools such as Rfam and RNAmmer, and in collaboration with genomic and model organism resources) systematically provide complete sets of non-coding RNA annotations across all complete genomes. The resources developed under this proposal will serve as the core infrastructural component of a wider international initiative to coordinate work on functional RNAs.

Summary

In molecular biology, the central dogma says that the genes in DNA code for RNA. RNA molecules are then translated into proteins that are the mini-machines that carry out the main processes in the cell. It is only recently apparent that many genes code for RNAs, which are not translated into proteins, and which carry out important functions in the cell as RNA. These molecules are often known as non-coding RNAs. Much of the focus in biology has been on DNA and proteins, but recently there has been a surge of interest in non-coding RNAs. In fact, the mini-machine that makes proteins from RNA, called the ribosome, has itself been shown to be an RNA-based machine. Non-coding RNAs have also been shown to be widely involved in regulating the levels of other genes. Research and innovation in the area of non-coding RNAs, and in molecular biology more generally, is hampered by the lack of an authorative and comprehensive resource collecting together all known non-coding RNAs. There are over 20 different online databases that contain information about different types of RNA molecules. Each of these resources makes their information available in different ways. The scattered nature of these resources makes it nearly impossible for biologists to discover what is known about non-coding RNAs related to their research area. In this proposal, we will create a new online resource to collect together information about non-coding RNAs. This resource, called RNAcentral, will be a central warehouse for holding many types of information. The most important information stored is called the sequence of the RNA. Many existing RNA resources (called RNAcentral Expert Databases) will provide their data to RNAcentral using software and interfaces that will be created as part of this proposal. One specific expert database, called miRBase, based at the University of Manchester, will test out the systems for providing data to RNAcentral. RNAcentral will hold the common information about each type of RNA. For more specialised information, RNAcentral will provide links back to the RNAcentral Expert Databases. In order to make the RNAcentral resource cost effective, we will be reusing and modifying code that is already in use by the European Nucleotide Archive and Ensembl Genomes. These two databases are based at the European Bioinformatics Institute near Cambridge. By the end of this project, researchers from around the UK and the rest of the world will have access to a single resource of RNA sequence information. This information will be freely available in a variety of ways including via a website and as a downloadable database.

Impact Summary

RNAcentral will provide an underpinning resource contributing indirectly to all BBSRC strategic objectives: food security, biofuels, industrial biotechnology and human health. It will be used by members of diverse life science research communities, ranging from bioinformaticians, to experimental biologists, to academic clinicians. RNAcentral will have an important impact in applications such as biotechnology, therapeutics, agriculture and ecology. The need for RNAcentral has become critical through the huge growth in discovery of non-coding RNAs from next generation sequencing. By capturing and disseminating this valuable knowledge, we will be directly addressing the BBSRC's enabling themes, data driven biology, systems approaches to biosciences and synthetic biology. A fundamental part of the latter two themes is a complete "parts list" for each genome, and RNAcentral will help move science towards that goal and allow researchers to find all RNA genes in an organism easily. RNAs hold great hope for ever-wider clinical and biotechnological applications. For example, microRNAs have been implicated as diagnostic signatures for cancer, snoRNAs in the major Prader-Willi phenotypes, bacterial small RNAs in pathogenicity, plant small RNAs in hybrid necrosis, and ribozymes in the cleavage of specific target RNAs. Again, improved annotation of and access to RNA data will improve the discovery and utilization of novel RNA targets for diagnostics and drug targets. There is intense research in the field of RNA based therapeutics and they hold some promise to improve health and welfare internationally. A number of commercial organisations manufacture experimental resources, for example microarrays, based on up-to-date gene annotation. Some resources have also been made available for specific classes of non-coding RNA gene; for example, several companies make microRNA detection kits. The companies themselves will therefore benefit from improved annotation of non-coding RNAs, and these resources underpin experimental studies in commercial and academic organisations. Along with the more clinical aspects described above RNAcentral will help to foster wealth creation through innovative application of RNA sequence information. Non-coding RNAs such as ribosomal RNAs have long been used as a tag to identify species. Application of high throughput sequencing has opened up opportunities to understand biodiversity on an unprecedented scale. By better understanding biodiversity and how it is being changed will enhance our ability to manage and conserve the world's great natural genetic resources. Having all known non-coding RNA sequences in a single resource will give allow for a much easier overview of the growth and impact of RNA data. For example one will be able to compare the number of RNA genes versus protein coding genes in a genome. This will allow policy makers and funders to better gauge the scale of support needed to maximise output compared to other priorities.
Committee Not funded via Committee
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file