Award details

Phylogeographic inference using genomic sequence data under the multispecies coalescent model

ReferenceBB/P006493/1
Principal Investigator / Supervisor Professor Ziheng Yang
Co-Investigators /
Co-Supervisors
Dr Daniel Dalquen
Institution University College London
DepartmentGenetics Evolution and Environment
Funding typeResearch
Value (£) 398,654
StatusCompleted
TypeResearch Grant
Start date 01/05/2017
End date 30/04/2020
Duration36 months

Abstract

We will improve the maximum likelihood and Bayesian MCMC methods developed by the PI and collaborators for analysis of genomic sequence data from multiple species to infer the species phylogeny under the multispecies coalescent model. Those methods are superior to existing heuristic methods in that they are able to accommodate ancestral polymorphism and incomplete lineage sorting, gene tree-species tree conflicts, and uncertainties and errors in gene trees due to limited information in the sequence data. We will extend our program 3S to develop a maximum likelihood method of species tree inference under the multispecies coalescent model with introgression, which is expected to be very useful for inferring species phylogenies when the species are closely related and introgression is common. We will extend our Bayesian MCMC program BPP, to implement sophisticated mutation model (such as GTR+G) and to relax the clock so that the method can be applied to distantly related species. We will implement and evaluate novel MCMC proposal kernels to improve the mixing efficiency of the transmodel MCMC algorithms. We will parallelize the program to make efficient use of modern multi-processor multi-core computer hardware.

Summary

Our evolutionary history is written in our genomes. By comparing DNA sequences from different species or multiple individuals of the same species we can work out how the species are related, when they diverged from each other, whether there was introgression between the species, and whether the population size of a species went through a bottleneck or other demographic changes. DNA sequences can also be used to identify species and delineate species boundaries. To address such exciting questions, powerful statistical methods and computational algorithms are necessary. In this project we will develop new statistical models and computer algorithms for efficient analysis of genomic sequence data within two well-established statistical frameworks: maximum likelihood and Bayesian inference. We will develop a maximum likelihood method for estimating the species tree that accommodates the random process of biological reproduction and genetic sequence evolution, as well as introgression or hybridisation that may be common between closely related species, especially during radiative speciations. We will introduce significant improvements and extensions to our Bayesian model-comparison approach to delimiting species using genomic sequence data. We will implement sophisticated models to describe the evolutionary process of DNA sequences and to allow changes in the evolutionary rate among lineages so that the program can be applied to estimate species phylogenies for distantly related species, such as different orders of mammals. We will parallelize the program to improve the computational efficiency.

Impact Summary

Delimiting species boundaries and inferring species phylogenies are of vital importance to assessing the current biodiversity, to understanding the impact of environmental and societal changes on species extinctions, and to developing effective conservation policies. The methods developed in this project, for delimiting and identifying species, provide powerful tools for analysis of genomic datasets, and results obtained from such analyses will be critical to effective decision making concerning biodiversity management and conservation. The methods can also be used to identify species, and are useful for tracking illegal wildlife trade.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsSystems Biology
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file