Award details

A tool for haplotype reconstruction from polyploid genomes

ReferenceBB/L018535/1
Principal Investigator / Supervisor Professor Sophien Kamoun
Co-Investigators /
Co-Supervisors
Institution University of East Anglia
DepartmentSainsbury Laboratory
Funding typeResearch
Value (£) 101,587
StatusCompleted
TypeResearch Grant
Start date 01/06/2014
End date 31/05/2015
Duration12 months

Abstract

In this proposal, we aim to develop a method that reconstructs the haplotypes of polyploid species from short sequencing reads using the recently developed linkage methods. In the linkage method, haplotype construction is performed using SNP linkage. Short reads from next generation sequencing are aligned to the reference genome and connected using heterozygous SNPs, creating local haplotypes. This process is performed in partial genomic regions called sliding "windows". The "window" moves along the chromosomes generating many local haplotypes. The local haplotypes are ranked based on scores that are calculated using their frequency in each "window". Based on rank, local haplotypes are assembled into major haplotypes. For a diploid organism, two homologous chromosomes are expected. Therefore, the two major haplotypes with the highest scores are selected and all others excluded. For polyploid species, we need to improve (1) the process of excluding major haplotypes, and (2) the concatenation process of local haplotypes. In the phasing process of a diploid genome, the maximum number of major haplotypes is limited to two, which correspond to the paternal and maternal haplotypes. For polyploid species we must consider more than two haplotypes. We will relax the limitation of maximum number of major haplotypes and adjust it according to the pre-determined ploidy level. We will incorporate Bayesian estimation of haplotype frequency into this adjustment. Variation in recombination rate will influence the expected number of haplotypes. Therefore, for allopolyploid species we will use homoeologous SNP information to improve accuracy of concatenation of local haplotypes. We will test the performance and improve the algorithm using simulated data. Finally, we will apply the developed algorithm to real crop and crop pathogen data, including sequences from autotetraploid potato, allohexaploid wheat to the polyploid plant pathogen Phytophthora infestans.

Summary

A number of plant, animal, fungal, and oomycete genome sequences have been elucidated over the last decade. This unprecedented leap forward in genome-wide knowledge has empowered population geneticists with tools to address fundamental questions relating to population dynamics at a substantially enhanced resolution. Once high quality genome sequences become available, researchers can rapidly reconstruct the genomes of individuals within the same species using next generation sequencing technology and efficient alignment methods. The subsequent comparison between individuals within a population enables rapid gene mapping to address complex population dynamic questions at the whole genome scale. However, methods that exploit next generation sequencing for genome resequencing, population genetics, and gene mapping remain limited in polyploid species, which include several important crop species such as wheat (hexaploid) and potato (tetraploid) as well as crop pathogens such as Phytophthora infestans. The limitations are due to the complexity of manipulating polyploid genome sequences when compared to diploid organisms. Given that association mapping and population studies are reliant on haplotype reconstruction, polyploidy has been a major constraint for progress in both basic and applied research. Current methods are unable to reconstruct reliable haplotypes for polyploid organisms from short read sequences. Therefore, there is an important need for methods that can accurately recreate haplotypes from short sequencing reads. In this proposal, we aim to develop a method that reconstructs haplotypes of polyploid species using the recently developed linkage methods. At the completion of this project, the algorithm for haplotype reconstruction will enable the full exploitation of resequencing data for a variety of polyploid species, including important crops and pathogens. The method will be also applicable to ancient DNA that tends to be extremely fragmented preventing long read sequencing. The reconstruction of ancient DNA will help in elucidating past epidemics and the evolutionary history of several organisms, including important crops.

Impact Summary

The PI will manage the impact plan. The PI has an excellent track record in communicating the outcomes of his research to a broad audience and sharing tools, resources and associated code in a free and open manner (e.g. crowdsourcing of ash dieback genomics). TSL has a dedicated communications office for release of information to the general public through websites and the media. The PI will oversee the impact activities and, whenever necessary, will seek the assistance of other expert staff at TSL (researchers and administrators). Where impact activities include outreach/press releases the relevant office at TSL will be involved. Members of the research group are actively involved in a range of outreach activities. The PI has regularly given talks to public audiences (Science Café, Friends of JIC, Linnean Society etc.) on issues such as food security and plant pathology. He contributes to relevant debates using social media tools such as Twitter (@kamounlab). His >2,200 tweeps (Twitter followers) include members of the general public, policy makers, teachers, journalists, farmers and agribusiness. He also manages a Scoop.it blog on "Plants and Microbes" that has received ~100,000 page views in just two years. The PDRA for this project will benefit from improved skills, knowledge and experience gained from the research and wider training. This will contribute to their future economic activity in the public and/or private sectors. The nature of the project is such that the individual will likely develop skills that should prove highly attractive in the marketplace. The resulting innovation and training will provide the next generation of skilled bioinformatics scientists, with benefits beyond the immediate outcomes of this project.
Committee Research Committee B (Plants, microbes, food & sustainability)
Research TopicsCrop Science, Microbiology, Plant Science, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file