Award details

A Reference Transcript Database for improved analysis of RNA-seq data from barley

ReferenceBB/R014582/1
Principal Investigator / Supervisor Professor Robbie Waugh
Co-Investigators /
Co-Supervisors
Professor John Brown
Institution University of Dundee
DepartmentSchool of Life Sciences
Funding typeResearch
Value (£) 318,088
StatusCompleted
TypeResearch Grant
Start date 01/10/2018
End date 30/09/2020
Duration24 months

Abstract

Post-transcriptional mRNA processing, particularly Alternative Splicing (AS), generates multiple transcript isoforms per gene. AS occurs in up to 70% of intron-containing plant genes. AS isoforms can either be targeted for degradation or can encode proteins with different functions. In Arabidopsis, a combination of ultra-deep RNA-seq and new computational methods of analysis generated transcript-specific expression datasets that allow us to interpret the contribution of individual transcript isoforms to overall patterns of gene expression. The new methods required the development of a Reference Transcript Dataset (RTD) - ultimately a library of all transcript isoforms present within the cells and tissues of an organism. We generated AtRTD2 comprising over 82k non-redundant transcripts for the 34k Arabidopsis genes. The accurate quantification of individual transcripts and AS events signal a step change in plant transcriptome analysis. Here, a barley RTD will be constructed from full-length transcript datasets generated by PacBio Iso-seq, supplemented and error-corrected by deep Illumina paired-end RNA-seq data. We will sequence RNA from twenty tissues, including plants exposed to biotic and abiotic stress. The RTD will be made immediately available to the barley research community to allow unified analysis/re-analysis of new/existing RNA-seq data and to aid the design of new experiments (e.g. time-courses of infection or abiotic stress). Transcript-specific data will identify genes regulated at the level of transcription, AS and both. We will identify novel genes and mechanisms of regulation which contribute to the complex transcriptome re-programming responsible for the response of a plant to environmental or developmental cues. The new data will provide novel insights into genes/transcripts that control phenotypes and, where appropriate, causal variants that can be used to develop genetic markers for use in crop improvement.

Summary

The term 'gene expression' refers to the biological process by which a gene gives rise to a protein. In eukaryotes, gene expression is complex. The DNA sequence of the gene is first copied into a precursor messenger RNA (pre-mRNA) by the process of transcription and the pre-mRNA subjected to several processing steps to form a mature messenger RNA (mRNAs) that is the template for synthesis of the corresponding protein. The post-transcriptional processing steps can generate different mRNA transcripts from the same gene (i.e. transcript isoforms), effectively modulating individual transcript abundance and potentially protein function. Having multiple transcript isoforms from a single gene is problematic in terms of 1) defining the expression levels of individual transcript isoforms and how they change under different conditions, and 2) determining their characteristics - such as whether they encode protein isoforms or not. As gene expression data is widely used to derive biological inference, for example, by grouping genes according to common patterns of expression, failure to take account of the relative abundance of alternative transcripts will unavoidably generate false conclusions. In this project, we focus on the development of a resource/tool that will allow the accurate detection and quantification of mRNA transcript isoforms in barley. The tool will enable high resolution analysis of dynamic changes in gene expression at the individual transcript level and as a recognised and accessible reference will help unify and structure such analyses across a research community. One of the main approaches scientists use to associate genes with functions is to monitor patterns of gene expression: i.e. where and when genes are switched on or off, and at what level. Current approaches provide an overall measure of gene expression by counting the frequency of occurrence of very specific sequences that correspond to a given mRNA relative to the whole population ofmRNAs in a particular sample and transforming these counts into relative abundance levels. However these methods are unable to distinguish the abundance of individual isoform variants, in particular those that determine protein levels, structures and activities. We call the tool a 'Reference Transcript Database' or RTD. The RTD is effectively a library of all of the transcript isoforms that exist in a diverse range of tissues from a single organism. By using the RTD in gene expression studies we can identify and determine the abundance of different transcript isoforms easily and quickly, and these can be used in subsequent functional analyses. We focus this project on the crop plant, barley, a model for the small grain Triticeae cereals that include wheat and rye. The RTD will allow effects on global and specific gene expression to be easily analysed at the transcript level in plants subjected to a range of conditions or treatments, improving our community's ability to explore and understand a wide range of biological processes. The RTD will be refreshed and maintained longer term by the barley and computational sciences groups at the James Hutton Institute.

Impact Summary

We envisage two significant primary impacts: the first will be on the ability of barley researchers exploring gene function by RNA-seq based expression analysis to more accurately analyse and interpret their data. The second will be on anyone who refers to the reference barley genome sequence because the RTD will be a key informational resource for experimentally supported genome annotation. As such the primary beneficiaries and users will be the research sector, both academic and industrial. The first version of the barley RTD will be generated and made available to the community before the end of this 24 month project. We believe the main challenge to maximising impact will be to raise awareness of the value of the RTD and promote its adoption by the research community. While the appended letters of support demonstrate community support and awareness, we are conscious that RTD development needs to be done quickly so that research groups can use, plan and design RNA-seq experiments with an RTD and transcript isoform analysis pipeline firmly in mind. The main Impact Objectives are therefore to: 1) Inform the barley community of the value of the use of the RTD well ahead of a primary release, allowing groups to design and plan RNA-seq experiments with this in mind. 2) Inform the barley community of the value of transcript-isoform specific expression data for identifying genes regulated by post-transcriptional processes such as AS. 3) Release the RTD to the community as soon as possible through standard communication channels including community websites and social media. To achieve these objectives: 1. The PI/Co-I will ensure community awareness by contacting barley research group leaders with details of the project and how it will benefit them and have the Co-I describe the development and advantages of the Arabidopsis AtRTD2 at a meeting of barley researchers early in the programme. 2. The PIs/Co-Is will present regular updates of progress at national and international conferences and meetings (e.g. Monogram, PAG) as well as invited seminars 3. The initial barley RTD will be released to collaborating groups (see letters of support) for validation as soon as possible and subsequently made widely available on our websites prior to publication. As an RTD is not a static entity, we will release versioned updates (with change logs) over time (updated RTDs will be essential for our own research as well as that of the community) 4. Training and mentoring the PDRA and encouraging their participation in public engagement activities.
Committee Research Committee B (Plants, microbes, food & sustainability)
Research TopicsCrop Science, Plant Science, Systems Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file