Award details

Rfam: Towards a sustainable resource for understanding the genomic functional ncRNA repertoire

ReferenceBB/M011690/1
Principal Investigator / Supervisor Dr Alex Bateman
Co-Investigators /
Co-Supervisors
Dr Robert Finn
Institution EMBL - European Bioinformatics Institute
DepartmentSequence Database Group
Funding typeResearch
Value (£) 427,988
StatusCompleted
TypeResearch Grant
Start date 21/01/2015
End date 04/11/2018
Duration45 months

Abstract

This proposal concerns the Rfam database and associated web portal, which uses covariance models to describe RNA families, and annotates these families with functional information. We will continue to create new families and examine our coverage of the RNA sequence database, RNAcentral to identify ncRNAs which are not covered in Rfam, and use this information to direct new family building. We will also update and improve our functional annotation of ncRNAs by attaching Gene Ontology terms to families, and using software tools to automatically propagate our annotations to the Gene Ontology Consortium. This will result in improved functional annotation for ncRNAs and by exporting them to the Gene Ontology consortium, they will be propagated to a wide range of resources ensuring their maximum utility. To deal with the data deluge that risks hampering many bioinformatic resources, we will move to producing family alignments based on sequences from completed genomes only. This will result in smaller families which are more biologically relevant, as the absence of a match in related organisms will represent a true gene loss and not incomplete sequence data. We will produce new visualisation tools using technology such as BioJS to take advantage of this new information. To increase the sustainability of our resource, we will develop software tools and associated training materials to allow users to build their own covariance models, and submit them to us for propagation throughout the community.

Summary

In molecular biology, the central dogma says that genes encoded in a genome code for RNA, which is then translated into the proteins carrying out the main processes of the cell. But, RNA is not just an intermediate step between genes and protein. Instead, RNA is capable of performing a number of tasks that are essential for life - for example, the ribosome (the machine responsible for synthesizing proteins from RNA) is an RNA-based machine, and RNA plays important roles in regulating the levels of other genes. These RNAs involved in biology are known as non-coding RNAs (ncRNA). RNA research has lagged behind that of proteins, in part due to the difficulties in working with them experimentally and computationally. The field of RNA biology is comparatively poorly served with resources that can aid research when compared with protein science. Rfam is one of the largest and most authoritative sources on ncRNA information, and provides a central portal of information covering a wide variety of ncRNA types. We use statistical models to group related non-coding RNAs into families. We then provide information on their function, as well as providing tools which other scientists can use to discover related non-coding RNAs in their samples of interest. A primary use of our database is to identify ncRNAs in DNA sequences. This allows scientists to map the positions of ncRNAs and study how ncRNAs have evolved between related organisms giving clues to their function. We aim to facilitate this further by providing families of ncRNAs from organisms which have had their entire genome sequenced. These organisms are generally those which are of interest to scientists because of their role in disease (e.g. pathogenic bacteria), their economic importance (e.g bread wheat, a major source of human nutrition), or because they occupy an important biological niche (e.g, humans). We'll also provide researchers with tools and training to build their own RNA families, allowing them to study RNAs which are of particular interest to them. Not only is it important to be able to identify a ncRNA, it's also important for us to tell our users what the function of an ncRNA is. To this end, we are improving our functional annotation of our RNA families, by using structured language terms that are easily parseable by both humans and computers. This means that our large data sets can be mined quickly, allowing researchers to build up a picture of how ncRNAs interact with the rest of the cell's components and understand more about the roles ncRNA play in biological systems. All our information is freely available via the Rfam website and as a downloadable database. We also export our data to other resources, such as databases concerned with a specific organism, and more general RNA databases such as RNAcentral.

Impact Summary

Rfam is a resource that contributes to researchers involved in all BBSRC strategic priorities but primarily food nutrition and health and data driven biology. It will be used extensively by the life sciences community, including bioinformaticians, wet-lab researchers and clinicians. The huge growth in data produced by new sequencing technologies means that it is now more important than ever that researchers have access to tools to help them interpret their data. Rfam is the only resource currently capable of identifying a wide range of ncRNA homologs in sequence data and therefore plays a key role in both data driven and systems biology. Our move to genome-centric annotation means that Rfam will provide comprehensive annotation of ncRNAs in many organisms, which will be of great benefit to many model organism resources, such as PomBase, Flybase and the more comprehensive Ensembl and Ensembl Genomes. ncRNA information is frequently missing from genome annotation; Rfam's data can ensure scientists have a more complete picture of the "parts list" involved in constructing each genome. As with Rfam, many of the resources benefiting from Rfam's data are based in the UK, thus contributing to the UK's international reputation as a leader in bioscience. We are only recently beginning to understand the role ncRNAs play in health and disease. For example, microRNAs are deregulated in cancer, snoRNAs are silenced in Prader-Willi syndrome and plant microRNAs play important roles in immune responses against viruses. There are significant research efforts into RNA-based therapeutics, which are promising tools to improve health and welfare. Rfam is a crucial resource for such work, allowing similarities in RNAs between organisms to be studied and providing researchers with search tools to identify previously unknown ncRNA homologs. A major aim of this proposal is to develop tools to allow researchers to create their own Rfam families. Thus creating a set of community based curators for Rfam. Historically, family building has been a specialised job performed by experienced Rfam curators; however the software is maturing to a point where family creation by non-specialists is feasible. Thus, a major impact of our work will be the transfer of knowledge and skills to a wide range of RNA researchers and providing them with bioinformatic tools they can use to further their work. This approach also enables researchers to develop a more multidisciplinary approach to the understanding of RNA function.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file