Award details

Resequencing Arabidopsis thaliana

ReferenceBB/F019793/1
Principal Investigator / Supervisor Dr Ewan Birney
Co-Investigators /
Co-Supervisors
Institution EMBL - European Bioinformatics Institute
DepartmentEnsembl Group
Funding typeResearch
Value (£) 182,817
StatusCompleted
TypeResearch Grant
Start date 01/02/2009
End date 31/01/2011
Duration24 months

Abstract

Describe the proposed research in a manner suitable for a specialist reader. This summary will be made publicly available if the proposal is funded. [up to 2000 characters] This project will resequence the genomes of 17 accessions of Arabidopsis thaliana using the Solexa platform. The accessions are the founders of a panel of ~1000 recombinant inbred lines (HSRILs), of which 763 have been bred and which we will deposit in the A thaliana seed stock centre as a public resource (BBSRC grant BB/D016029/1). We are genotyping the lines and founders in Autumn 2007 using 1536 Illumiina SNPs. As the genome of each HSRIL is a mosaic of the 19 founders we can impute the mosaic from the SNP genotypes using a Hidden Markov Model (1). Knowledge of the mosaic structures combined with the sequences of the 19 founders will enable us to impute the sequence of each of the 763 lines; hence we can perform whole genome association and test every imputed SNP for association with any phenotype measured across the lines. Using the Solexa sequencing platform, millions 35bp reads can be generated in one run at low cost. Results of resequencing Bur-0 by Detlef Weigel's lab suggest that, with 3 runs of a Solexa instrument (or 1.5 paired-read run per genome) over 80% of the genome and of the SNPs/short indels will be recovered accurately. All the data will be made publicly available, through Ensembl and by adapting the GSCANDB database (2). This was developed for human and mouse whole genome association mapping, and will be modified to display the genome scans of the A. thaliana mapping panel, incorporating gene annotations to identify candidate genes and link into external Arabidopsis genome resources. It also will provide a mechanism for collaborators to publish their genome scans. (1) Mott R, Talbot CJ, Turri MG, Collins AC, & Flint J (2000) Proc Natl Acad Sci U S A 97, 12649-12654 (2) Taylor M, Valdar W, Kumar A, Flint J, & Mott R (2007) Bioinformatics 23, 1545-1549

Summary

Describe the proposed research in simple terms in a way that could be publicised to a general audience [up to 4000 chars] Variation, such as flowering time, height and leaf colour between plants of the same species can be explained in part by differences between their DNA sequences, and this fact can be used to identify genes responsible for many phenotypes of importance in agriculture. One way of doing this this requires a genetic reference population, which is a set of inbred lines of plants of the same species (ie varieties of plants that breed true and contain little or no genetic variation within each variety but which differ between varieties) whose genome sequences are known, at least approximately, and on which the phenotype of interest, such as flowering time, is measured. Then by correlating the differences observed between the phenotypes measured across the lines with differences between their DNA sequences, it is possible to find DNA changes that may be responsible for the phenotypes, and hence identify the responsible genes. Because all flowering plants have a common ancestor and share similar genes, understanding the function in one plant species can often be translated to another. Therefore by working with a simple model plant, the thale cress Arabidopsis Thaliana, which is easy to grow and has a short generation time, it is possible to discover gene function and then apply this information to agriculturally important crops, to improve yields to the benefit of mankind. We have developed a reference population of 763 Arabidopsis inbred lines, shortly to be expanded to ober 1000 lines. They have been bred by repeatedly crossing 19 existing varieties of this plant that were collected from the wild and from across the world. The lines have been inbred (called 'selfing') for several generations until each line has a fixed DNA sequence which is a random mosaic of the 19 founders. Each line is a different mosaic. We have begun to use these lines to find the genes responsible for traits such as flowering time, but in order to make the best use we need to know the genome sequence of each line. Fortunately we don't need to sequence each of the 763 lines, which is costly. Instead we can infer their sequences from the 19 founder genomes because we know the mosaic structure of the 763. Recent technological improvements make it possible to sequence genomes much more cheaply and quickly. The genome of Arabidopsis Thaliana is about 120 million bases long and can now be sequenced in about a day. We propose to sequence the genomes of 17 founders (the other two genomes are already sequenced) and make this data publicly available. We will develop software and statistical methods so that DNA variation between the 19 genomes can be used to help identify functionally important variations in the 763 lines. We have already distributed the lines by depositing their seeds in the A. thaliana stock centre so that others can use this resource. The genomes of each of the founders will also be of interest for studies of evolution and population genetics. They will be annotated by the Ensembl Plants team at EBI and the annotations displayed on the Ensembl genome browser.
Committee Closed Committee - Genes & Developmental Biology (GDB)
Research TopicsPlant Science
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file