Award details

Detecting signatures of natural selection in the human genome with geographically explicit models

ReferenceBB/H008691/1
Principal Investigator / Supervisor Dr Paul Flicek
Co-Investigators /
Co-Supervisors
Institution EMBL - European Bioinformatics Institute
DepartmentVertebrate Genomics
Funding typeResearch
Value (£) 146,934
StatusCompleted
TypeResearch Grant
Start date 01/09/2010
End date 31/08/2013
Duration36 months

Abstract

We propose to exploit the recently available datasets on worldwide human genomic diversity to test for possible targets of natural selection in the genome. We will first develop a demographic, geographically explicit inference framework for the analysis of genetic data. Using this tool, we will reconstruct the expansion out of Africa by anatomically modern humans, taking into account climatic changes over the last 100k years. We will then run stochastic simulations within this well parameterised demography to characterise genomic regions likely to have been affected by natural selection. The analyses will be run on the 650k SNPs already typed for the HGDP-CEPH panel (~1,000 individuals from 51 populations) and subsequently on larger datasets, which will be sourced from ongoing dense re-sequencing projects. To get further insights into the underlying selective forces, plausible targets of natural selection will be tested for their spatial association with environmental variables such as climate and diseases. We will also expand our approach to investigate natural selection on human mitochondrial DNA (mtDNA). Our group has recently uncovered new strong evidence that worldwide mtDNA diversity has been partly shaped by climate. We will sequence complete mtDNA genomes for 1,400 individuals belonging to 76 populations (the HGDP-CEPH panel and 25 Amerindian and Siberian populations previously genotyped at a large number of neutral autosomal loci). We will then investigate whether the current geographic distribution of mitochondrial haplotypes is compatible with our understanding of past human migrations as inferred from nuclear markers. Our demographic, spatially explicit model will provide a formal framework to test whether the association between some haplotypes and temperature that we detected in our previous work can be explained by stochastic events, or whether selection has to be invoked.

Summary

Modern sequencing techniques have provided us with very large genetic datasets, on a scale that was hard to imagine only a couple of years ago. As these datasets comprise human populations from the entire globe, it is tempting to look at the geographic distribution of genetic variants and try to find explanations for why some variants are more common in some places rather than others. After all, we have known for a long time that sickle cell anaemia is found in regions where malaria was prevalent, as it can confer resistance to the deadly disease. So, could we find other important genetic variants that have been affected by natural selection by examining their geographic distribution? While this approach sounds promising, it raises the issue of being able to distinguish between those patterns that truly reflect past and present selection, and patterns that might have simply arisen by chance. In this project, we propose to develop a population genetics framework that will allow us to reconstruct the spread of anatomically modern humans around the globe, taking into account past changes in climate and the shape of continents. By knowing how and when people got to different parts of the world, we will then be able to distinguish which genetic variants have geographic distributions too extreme to be the result of mere chance, and thus have been the target of natural selection. Besides looking for regions under selection in the nuclear genome, we will also consider the small amount of genetic material contained in the mitochondria, small organelles that act as the biochemical powerhouses in our cells. Mitochondrial DNA is arguably the most widely used source of information for reconstructing human past history, but such reconstructions rely on the assumption that mitochondrial DNA has not been affected by natural selection. Our new framework, together with a better geographic coverage of mitochondrial genetic variability that will be achieved in this project, will allow us to test the assumption of neutrality and to find any deviation that should be taken into account in future work on human settlement history.

Impact Summary

The research herein proposed comprises four different objectives, which are likely to appeal to different parts of the scientific community and the wider society. We intend to fill a major gap in the toolbox of population biologists with an eco-geographic inference framework. This should be of interest to human population biologists. However, so far the framework has encountered most enthusiasm from population biologists outside the human genetics community. Despite very limited publicity so far, we have been approached by numerous groups working on organisms as diverse as plant pathogens or marine mammals. We wish to encourage the use of the framework by making it freely available and producing extensive and user-friendly documentation. We also hope that the approach will be adopted by epidemiologists in the longer term. Our reconstruction of human settlement history should provide a richer more detailed picture of human evolution over the last 100,000 years. We expect the results to be of interest to our colleagues in human genetics as well as to anthropologists and archaeologists. This is also a topic of interest to the general public. In addition to peer reviewed publications destined to the academic community, we wish to engage with a wider audience. To this effect, we are planning to produce a series of interactive flash applets capturing the main results. These will be made available through our websites but will also be used in talks and exhibitions. The new analyzes on selection in the human genome should again appeal to scientists and non-scientists alike. This part of the project is really a leap into the unknown and it is thus difficult to make plans on how to publicize the results. Our methodology combined with the extraordinary increase in human genomic data should provide us with unprecedented power, making it likely that we will identify previously unsuspected genes of interest. The appeal of the results, in particular to the general public, will largely depend on the new genes we will identify. Irrespective of the results, we expect that the wider community of geneticists will be interested due to the novelty of the approach and the high statistical power of the analysis. Selection in the mitochondrial genome is a completely different situation from the genome-wide data mining as we will test a very specific hypothesis. We have previously shown that mitochondrial diversity correlates with minimum temperature and have identified two plausible SNPs that make sense from a functional perspective. The manuscript was reviewed by Nature, Science and PLoS; the reviewers rejected it eventually on all three instances mainly because they felt that the results had such far reaching consequences that not the slightest doubt could be allowed to exist. Indeed, probably over 80% of the literature based on human settlement history relies inferences from mtDNA and a correlation with climate would require revisiting it entirely. While we ran considerable controls, we were unable to perform the final control analysis as this requires matched samples for mtDNA and neutral genomic markers we did not have. Our proposed research will remedy this problem and clarify whether the previous results stemmed from a sampling artifact, an unknown complex demographic mechanism or will confirm our original results. In the latter case, this would arguably constitute one of the most important results in human population genetics and would lead to several paradigm shifts, such as reconsidering the pervasive notion of an 'out of Africa bottleneck'. We have no doubt that such a result would significantly impact large parts of the scientific community and generate considerable media attention.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file