Award details

What determines protein abundance in plants?

ReferenceBB/T002182/1
Principal Investigator / Supervisor Professor Frederica Theodoulou
Co-Investigators /
Co-Supervisors
Dr Keywan Hassani-Pak, Professor Kathryn Lilley, Professor Richard Mott, Dr Gancho Slavov
Institution Rothamsted Research
DepartmentPlant Sciences and the Bioeconomy
Funding typeResearch
Value (£) 3,354,456
StatusCurrent
TypeResearch Grant
Start date 01/10/2020
End date 30/09/2025
Duration60 months

Abstract

This project aims to understand how protein abundance is controlled in plants and to determine the phenotypic consequences of proteomic variation, together with genotypic, structural, epigenotypic and transcriptomic variation. We propose an integrated programme of quantitative trait loci (QTL) analysis of an Arabidopsis multiparental advanced generation intercross (MAGIC) population. Firstly, we will determine all variation in the 19 MAGIC founders, and interactions between different 'omic layers, via a comprehensive set of assays. Long-read sequencing of 18 founders' genomes will be performed for comparison of structural variation relative to the 19th founder, the Col-0 reference. We will measure epigenetic marks of cytosine DNA methylation and chromatin accessibility by ATAC-seq. Transcript abundance and regulatory RNA species will be analysed by RNA-seq and protein translation and abundance quantified by Ribo-seq and proteomics, respectively. Next, a holistic experimental and computational analysis of 400 Arabidopsis MAGIC RILs (recombinant inbred lines) will be used to understand the regulatory networks controlling protein expression and dissect the relative contributions of genotype (including small-scale variation and large-scale structural rearrangements), epigenotype, RNA transcription, protein synthesis and protein degradation. We will use statistical and machine learning (ML) approaches to construct different types of molecular networks and identify causal mediators. Co-expression analysis will also identify novel physical complexes and sets of proteins that participate in common processes. Selected networks and complexes will be tested experimentally. Whole plant phenotyping of the MAGIC lines will be performed and used together with the molecular data to interrogate the predictive ability of different 'omic layers across a range of phenotypes. Finally, data and knowledge generated will be shared with the community through a user-friendly web resource.

Summary

Proteins are the workhorses of the cell: they facilitate chemical reactions, act as gene switches and have structural roles. For cells to work efficiently, proteins need to be produced in the right place, at the right time and in the right amount. They also need to be removed when no longer needed. Crick's Central Dogma states that coding sequences of DNA are transcribed into mRNAs, which in turn are translated into proteins. There are many levels at which this process is regulated and there are still many gaps in our knowledge. We expect both inherited and environmental differences between individuals to play important roles in the control of proteins. This project seeks to use the model plant, Arabidopsis thaliana, to answer fundamental questions about the control of protein expression, including which mechanisms are important and how they interact in a complex multi-cellular organism. We also aim to determine to what extent the protein content of a given cell, tissue or organ predicts observable traits (the phenotype) of the plant. To address these questions, we have designed an integrated programme of experiments and sophisticated mathematical analysis around a genetically variable population of Arabidopsis (known as the MAGIC population). This is a powerful genetic resource for mapping sections of DNA that correlate with variation in a trait (known as quantitative trait loci, QTL), to identify causal variants and dissect the regulation of genome expression. We will characterise and compare the following different processes that potentially influence protein expression in the MAGIC lines: 1. Structural variation within the genome (including small-scale variation and large-scale structural rearrangements) 2. Chromatin accessibility, a measure of the availability of a given region of DNA for transcription. 3. Chemical modifications to DNA that do not involve a change in DNA sequence, known as epigenetic marks, which often indicate environmental perturbation. 3.mRNA abundance. 4. Protein abundance. It is important to take an holistic approach, because the amount of any given protein in an individual is determined by the balance of these processes. Much effort has been spent studying gene transcription, because it is relatively easy to measure on a genome-wide scale. However, evidence suggests that transcription is a poor predictor of protein abundance, because the control of translation and protein degradation are important, particularly in plants. Less research has been done on measuring translation, protein amount and protein breakdown but advances in technology now let us do so. Although it is relatively straightforward to measure genomic structural variation and epigenetic marks such as DNA methylation, their impact on protein expression is unclear. Therefore, we are in an exciting position to provide enormous insight into protein regulation. The power of this project derives from innovative computational analysis that will enable us to apportion the relative contributions of genotype, transcription, protein synthesis and protein degradation and identify networks controlling protein expression. Because collecting genome-scale data from many samples is expensive and time-consuming, we will use novel statistical methods to get more information without significantly increasing sample size, including combining different layers of information. This will be the first study of this kind on this scale. As well as depositing our data in public repositories, our findings will be made available to the academic community via a user-friendly knowledge discovery and gene mining resource. The approaches developed in this project will provide valuable fundamental insights that will be applicable to other organisms and which will also pave the way to future crop improvement.

Impact Summary

The project's immediate beneficiary will be the academic community. The project will deliver academic impact through fundamental discoveries, big data, novel methodology and trained personnel. The knowledge base and technology developed will benefit projects across the range of BBSRC's strategic priorities, especially 'agriculture and food security', 'bioenergy and industrial biotechnology' and "exploiting new ways of working". "Application of computational and mathematical techniques to high-quality, quantitative biological data" is at the heart of the project. Although this proposal employs the model plant, Arabidopsis thaliana, knowledge and techniques we develop apply to other plants and indeed to other organisms. Even in models such as Arabidopsis, much of the protein coding genome is not characterised functionally; we anticipate that this project will play an important role in assigning new gene functions. Academic beneficiaries not only include plant scientists and researchers with an interest in proteostasis but also synthetic biologists via provision of a knowledge base for selective manipulation of proteins and pathways. The project will provide new concepts and rich data sets for the genetics, epigenetics, proteomics and plant science communities, which will be made available through public repositories and peer-reviewed journals. In addition, a key goal is to make a comprehensive, integrated and interoperable data resource accessible to the community in a timely fashion. Therefore, we will build a knowledge graph for gene mining and knowledge discovery in Arabidopsis that can be accessed via a user-friendly webapp (KnetMiner) or programmatic access (RDF and Neo4j servers) and integrate this with published Arabidopsis data. A collateral benefit is the generation of high-quality proteogenomic data. Information about structural variation, alternative splicing, alternative start sites and novel proteoforms will improve genome annotation and inform the development of databases for searching peptide mass spectrometric data. Thus, we anticipate that the work will drive innovations in bioinformatics. Arabidopsis has been important in establishing statistical genetics methodology and we will develop novel predictive methodology integrating multi-'omic data, using information about both predicted and validated molecular interactions across 'omic layers. These analyses will inform future translational research. This collaborative project combines a unique skills base, providing a framework within which early career researchers can be cross-trained trained in a range of key laboratory and data science skills. Such trained researchers will be of benefit to the academic and industrial sectors. Skills and new methodology will also be shared through running workshops and training courses. Longer term, the project has potential to contribute towards wealth creation and deliver environmental benefits, for example through plant breeding. In addition to identifying candidate genes, pathways and regulatory networks for manipulation in crop species, the cross-platform QTL analysis in Arabidopsis leads the way to comparable approaches in crops that will feed directly into breeding pipelines and establish the UK bioscience community as a leader in translating genomics into crop improvement solutions. This will be expedited by the existence of MAGIC populations for a number of important crop species (rice, wheat, chickpea), several of which have been co-developed by RM. The annual turnover from UK plant breeding is estimated to be in the region of £200 to £230 million ("The UK Plant Breeding Sector and Innovation" The Intellectual Property Office, 2016), and the worldwide value of plant breeding is far greater, particularly when considering the indirect benefits of yield and quality improvements and increased resilience to changing climatic conditions.
Committee Research Committee B (Plants, microbes, food & sustainability)
Research TopicsPlant Science
Research PriorityX – Research Priority information not available
Research Initiative Longer and Larger Grants (LoLas) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file