Award details

Development of SUPPA for alternative splicing analysis from RNA-seq in plants across multiple conditions

ReferenceBB/N022807/1
Principal Investigator / Supervisor Professor John Brown
Co-Investigators /
Co-Supervisors
Institution University of Dundee
DepartmentSchool of Life Sciences
Funding typeResearch
Value (£) 91,650
StatusCompleted
TypeResearch Grant
Start date 01/10/2016
End date 14/11/2017
Duration13 months

Abstract

The regulation of gene expression is essential to plant growth and development. Alternative Splicing (AS) generates more than one transcript isoform per gene and occurs in up to 60-70% of intron-containing genes in plants. In Arabidopsis, using ultra-deep RNA-seq and new computational methods of analysis we have generated dynamic, transcript-specific data which allows us to interpret the contribution of individual transcript isoforms to overall gene expression. However for a comprehensive analysis of AS we require computational tools capable of dealing with complex AS events and identifying statistically different ASamong tens of thousands of transcripts across two or more conditions. We are the first plant group to apply SUPPA (a programme developed to analyse AS transcripts in human cancers) to plant RNA-seq data. The programme functions well for a number of genes/AS but does not deal well with complex AS events; also it was developed for analysis of binary systems (e.g. cancer cells versus normal cells) and needs to be substantially modified to handle multiple conditions accounting for variability among biological repeats and analysing differential AS. We have taken initial steps to demonstrate that SUPPA can be modified and have identified further areas for improvement. We have significant advantages in this proposal. Firstly, the proposal is a collaboration with Prof. Eduardo Eyras (Barcelona) the developer of SUPPA; secondly, we have probably the most extensive RNA-seq time-course dataset in plants in terms of the resolution of the time series (26 time points) and the ultra-deep RNA-seq data; and thirdly, we have extensive validation data generated on the same RNA samples which will be used to test and assess the planned modifications and the application to multiple time-point data. The output will be improved versions of SUPPA software applicable to the analysis of RNA-seq data of plant/crop species (as well as animal/human).

Summary

Genes are the repositories of hereditary information and proteins are the machines that carry out the functions of living cells. The term 'gene expression' usually refers to the process by which a gene gives rise to a protein. In eukaryotes, gene expression is complex and when protein-coding genes are expressed, the DNA sequence is first copied into a precursor messenger RNA (pre-mRNA) by transcription. The pre-mRNA undergoes several processing steps to form mature messenger RNAs (mRNAs) which direct synthesis of the corresponding protein (translation). An extremely important RNA processing step called alternative splicing (AS) generates different mRNA transcripts (i.e. isoforms) from the same gene and thereby modulates transcript and protein levels and functions. The majority of protein-coding genes undergo AS and the relative amounts of AS isoforms changes dynamically as cells and organisms develop and grow. High-throughput methods such as RNA sequencing (RNA-seq) are now capable of generating data on tens of thousands of transcripts from cells or particular stages of development or different conditions. To be able to analyse the dynamic changes in transcripts and AS, and to understand how this is regulated, we need computational tools that will allow the accurate measurement of these different mRNA transcript isoforms from these large datasets. The tool which we will develop here will enable the high resolution analysis of dynamic changes in gene expression at the individual transcript and AS event level. Being able to distinguish the abundance of different transcript isoforms is important because one of the main approaches scientists use to associate genes with functions is to monitor gene expression: i.e. where and when genes are switched on or off, and at what level. The RNA-seq technology and programmes to analyse the data are continually being improved. Significant recent advances are the release of computational programmes that can quantify transcript isoform abundance (e.g. Sailfish, Salmon, Kallisto) and can generate measures of AS (e.g. SUPPA) from large datasets very quickly. We have been using these programmes to analyse RNA-seq data from Arabidopsis. We also have an excellent experimental system which allows us to validate the RNA-seq results. Detailed comparisons have helped us to identify a number of discrepancies and issues with the SUPPA programme where it does not accurately report on, for example, complex AS events. We have taken initial steps in improving SUPPA with good success and here we aim to modify SUPPA in a number of ways so that it 1) accurately measures AS, and 2) can be applied to experiments with multiple conditions such as time-course series. The significance of this is that it will allow clustering of AS responses, correlation of AS indices with gene expression, and the building of splicing networks to understand the regulation of AS. Research groups in the UK and around the world use RNA-seq to analyse gene expression in plants and animals. Currently there are many limitations in quantifying transcripts and AS in a dynamic way - the improved version of SUPPA will deliver this function. It will be relevant not only to Arabidopsis but also to other plant and crop species, and to animal and human studies.

Impact Summary

The main impact of this work will be the programme to analyse AS in RNA-seq data from both plants and animals. In particular, it will impact researchers examining gene function and global expression changes at the transcript level. As such the main beneficiaries and users are the research sector, both academic and industrial. The main challenge to maximising impact of the new software tool is to raise awareness of its potential and utility with the people who are most likely to use it and benefit their research. This will be done in timely fashion so that other researchers can begin to use the software to re-analyse existing RNA-seq datasets and analyse new and future RNA-seq experiments. The main Impact Objectives are to: 1) Inform the alternative splicing community of the software for AS analysis of RNA-seq data and its uses and applications while it is being developed and tested; 2) Release the new SUPPA software to the community as soon as it is finalised and disseminate it further from specific websites used regularly by the community. To achieve these objectives: 1. The PIs/Co-Is will ensure community awareness by contacting research groups in the plant and animal AS community with details of the programme and how it will benefit them; 2. The PIs/Co-Is will present progress on the development at national and international conferences and meetings such as the annual UK RNA and EURASNET meetings as well as through invited seminars; 3. The new SUPPA will be released to relevant groups as soon as it is finalised and made widely available on our websites following publication; 4. Public engagement activities; 5. Training and mentoring of the PDRA.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsPlant Science, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file