BBSRC Portfolio Analyser
Award details
Phylogeographic inference using genomic sequence data under the multispecies coalescent model
Reference
BB/P006493/1
Principal Investigator / Supervisor
Professor Ziheng Yang
Co-Investigators /
Co-Supervisors
Dr Daniel Dalquen
Institution
University College London
Department
Genetics Evolution and Environment
Funding type
Research
Value (£)
398,654
Status
Completed
Type
Research Grant
Start date
01/05/2017
End date
30/04/2020
Duration
36 months
Abstract
We will improve the maximum likelihood and Bayesian MCMC methods developed by the PI and collaborators for analysis of genomic sequence data from multiple species to infer the species phylogeny under the multispecies coalescent model. Those methods are superior to existing heuristic methods in that they are able to accommodate ancestral polymorphism and incomplete lineage sorting, gene tree-species tree conflicts, and uncertainties and errors in gene trees due to limited information in the sequence data. We will extend our program 3S to develop a maximum likelihood method of species tree inference under the multispecies coalescent model with introgression, which is expected to be very useful for inferring species phylogenies when the species are closely related and introgression is common. We will extend our Bayesian MCMC program BPP, to implement sophisticated mutation model (such as GTR+G) and to relax the clock so that the method can be applied to distantly related species. We will implement and evaluate novel MCMC proposal kernels to improve the mixing efficiency of the transmodel MCMC algorithms. We will parallelize the program to make efficient use of modern multi-processor multi-core computer hardware.
Summary
Our evolutionary history is written in our genomes. By comparing DNA sequences from different species or multiple individuals of the same species we can work out how the species are related, when they diverged from each other, whether there was introgression between the species, and whether the population size of a species went through a bottleneck or other demographic changes. DNA sequences can also be used to identify species and delineate species boundaries. To address such exciting questions, powerful statistical methods and computational algorithms are necessary. In this project we will develop new statistical models and computer algorithms for efficient analysis of genomic sequence data within two well-established statistical frameworks: maximum likelihood and Bayesian inference. We will develop a maximum likelihood method for estimating the species tree that accommodates the random process of biological reproduction and genetic sequence evolution, as well as introgression or hybridisation that may be common between closely related species, especially during radiative speciations. We will introduce significant improvements and extensions to our Bayesian model-comparison approach to delimiting species using genomic sequence data. We will implement sophisticated models to describe the evolutionary process of DNA sequences and to allow changes in the evolutionary rate among lineages so that the program can be applied to estimate species phylogenies for distantly related species, such as different orders of mammals. We will parallelize the program to improve the computational efficiency.
Impact Summary
Delimiting species boundaries and inferring species phylogenies are of vital importance to assessing the current biodiversity, to understanding the impact of environmental and societal changes on species extinctions, and to developing effective conservation policies. The methods developed in this project, for delimiting and identifying species, provide powerful tools for analysis of genomic datasets, and results obtained from such analyses will be critical to effective decision making concerning biodiversity management and conservation. The methods can also be used to identify species, and are useful for tracking illegal wildlife trade.
Committee
Research Committee C (Genes, development and STEM approaches to biology)
Research Topics
Systems Biology
Research Priority
X – Research Priority information not available
Research Initiative
X - not in an Initiative
Funding Scheme
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list
new search