Award details

RAPIER: from RAD sequencing to population genetics and evolutionary modelling

ReferenceBB/K004212/1
Principal Investigator / Supervisor Professor Mark Beaumont
Co-Investigators /
Co-Supervisors
Professor Julian Gough
Institution University of Bristol
DepartmentMathematics
Funding typeResearch
Value (£) 84,162
StatusCompleted
TypeResearch Grant
Start date 01/10/2012
End date 30/09/2013
Duration12 months

Abstract

In the medium term, sequencing of a reduced representation of the genome is the only feasible way forward for the application of next-generation sequencing technology to population genetic analysis of non-model organisms. It is difficult, however, to organise the plethora of fragments into aligned sequences that can be compared across individuals in order to quantify nucleotide frequencies.This is especially true if no reference genome is available. Currently there is only one software package that attempts to deal with this problem. This pioneering effort has a number of flaws, however, which are addressed in this project. We will develop a clustering algorithm to build short sequence alignments based on all the data in a sample rather than on a per-individual basis as in the current package. This algorithm will take into account sequencing error measure by the phred scores. Sets of orthologous sequences will be identified, and genotypes together with their posterior probabilities under an error model will be output. In addition we will develop software for the inter-conversion of formats so that researchers can use the output for a number of different population genetic analysis packages. Furthermore we will complete the initial stages of a new software package that will allow for parameters to be inferred in a model of diverging populations with gene flow. This package will be based on approximate Bayesian computation, using recent enhancements to the method, and will take into account sequencing error in the estimation of gene frequency.

Summary

Individual organisms belonging to the same species differ from one another genetically. The pattern of genetic variation, and how it covaries with phenotype, is highly informative about the past evolutionary history of the population, and also provides insights into gene function. Population genetic analysis has been shown to be very useful in a number of different fields; for example in genome-wide association studies to look for disease genes, epidemiological analysis, forensics, and also in elucidating the past history and evolution of human populations. Other than in model organisms such as humans, it has, until recently, been very expensive to analyse a sufficient amount of genome to be able to make accurate estimates of the quantities of interest. The development of Next-Generation Sequencing (NGS) technology has made it possible to analyse a very large number of genes (regions of the genome). However NGS, by itself, is a broad tool more suited to the analysis of whole individual genomes, which is still relatively expensive. For population genetic analysis one requires a sample of genes across the genome to be compared across individuals. The method of RADseq has been developed to do this. It works by sequencing regions of the genome that have a particular motif (such as CCTGCAGG for example). Because fragments originate with the same motif the same region can be compared across individuals. The challenge is that these motifs occur typically many thousands of times in a single genome, yielding many genes, which need to be sorted out. Computer software has been developed to do this, but because the technique is very new, there are a number of problems and biases inherent in the current method. This project aims to fix many of these problems by taking a more rigorously statistical approach. We will develop new publicly available software, making it much easier to apply NGS methods in population genetics.

Impact Summary

The main impact of this research will be that the software tool that is generated in this project will allow for far greater use of NGS methods in population analysis. This will have a number of benefits outside academia: Livestock and crop breeding technologies will benefit, particularly when involving organisms for which reference genomes have not yet been produced. The software will provide improved identification of genetic markers. These will be useful in QTL identification, the formation of high-density linkage maps, and also targeted back-crossing when breeding. The software tool will have impact on decision makers in conservation and wildlife management. For example with improved generation of multiple genetic markers, the precision in the detection of hybrid individuals will be increased. Thereby helping to control the effects of introgressive invasions. In addition improved markers will allow for improved assessment of levels of inbreeding depression, by comparison of current levels of genetic variation with inferred past levels. Veterinarians and clinicians will benefit because improved marker development for novel disease organisms, will allow improved fitting of epidemiological models by means of NGS data. An increased in the number of genetic markers will enable agricultural decision makers to gain improved understanding of routes by means of which certain pests have arrived in a country. It is possible to use the genetic markers to compare different models of demographic history. There are also more indirect and long-term benefits through improved identification of the functional roles of genes involved in local adaptation. Genes identified has having adaptive value in a particular organism, from a genome-wide scan, can be further investigated, and their properties analysed. In this way, novel modes of action and regulatory pathways may be discovered, which may improve our understanding of gene action in humans, with potentia medical applications.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file