Award details

Analysis of quantitative genetic traits in a huge data set

ReferenceBB/N006178/1
Principal Investigator / Supervisor Dr John Hickey
Co-Investigators /
Co-Supervisors
Professor William George Hill, Professor Michael Watson, Professor John Woolliams
Institution University of Edinburgh
DepartmentThe Roslin Institute
Funding typeResearch
Value (£) 657,912
StatusCompleted
TypeResearch Grant
Start date 01/05/2016
End date 30/04/2019
Duration36 months

Abstract

This project aims to harvest scientific benefits from a 15 year, billion dollar pig breeding program. We will analyse the genome sequences, pedigrees and phenotypes of 325,000 pigs, in order to:- - Analyse the genetic basis of 25 quantitative traits at the molecular level. - Explain the covariance between traits. - Quantify the extent to which huge datasets help us answer these questions Our objectives are:- (1)The genetic basis of quantitative traits For each of the 25 traits we will count the number of quantitative trait variants that can be mapped, analyse how they are distributed across the genome (e.g. randomly or in clusters), which types of variant (e.g., indels or SNPs) control them; and which kinds of genome element (e.g. coding versus noncoding) contain the variants. We will analyse the interactions between the mapped variants that control each trait to quantify the degree to which they show additivity, dominance, or epistasis. We will also quantify the joint distribution of allele frequencies, ages, and effect sizes, quantify how and by how much the genetic variation changes across many generations. We will quantify the degree to which the contributions to genetic variation differ among the 11 related populations and the 19 generations of our data set and the extent to which they compare with what is known of other species. (2)What kinds of mechanisms cause traits to co-vary? We will measure the correlation between traits (locally and genome wide) and identify the extent to which pleiotropy and linkage disequilibrium contribute, the distribution of the magnitude and sign of joint effects of genomic regions on pairs of traits and the degree to which these variants are new or old, common or rare, lie in each type of functional region, and have large or small effect sizes. (3)We will quantify the extent to which huge data sets, with and without functional annotation information, help us with addressing these questions.

Summary

The genetic basis of quantitative trait variation and covariation is central to human genetics, evolutionary biology, and plant and animal breeding. In medical genetics many diseases, including schizophrenia, heart disease and cancer, are complex traits with continuous phenotypes and liabilities, which have multiple genome variants contributing genetic variance. In evolutionary biology fitness is largely due to such quantitative traits (e.g., fecundity, longevity). In plant and animal breeding most of the economically important traits are quantitative traits (e.g., milk, meat, and grain yields, environmental footprint, fecundity). Huge datasets are needed for statistical genomics because many variants (probably thousands), which can be clustered together, contribute to any individual quantitative trait and their effects can combine in complex ways (additive, dominant, epistatic). Moreover, important portions of the genetic variance of quantitative traits are controlled by variants that are rare, have small effect sizes or are highly correlated with other variants. The effects of such quantitative trait variants can only be separated when very powerful statistical models are used in very large data sets. We will analyse the genetic basis of 25 quantitative traits at the molecular level by creating and analysing a dataset containing genome sequences, pedigrees and trait records of 325000 pigs from the world's biggest commercial breeding programme. The dataset will be created and analysed using imputation and analysis algorithms based on those that we developed to support the breeding programme. The size of the dataset and the quality of the data will allow us to address three big questions:- 1. Which genome variants control which quantitative traits, how do they control them and how do the multiple variants that control a single trait interact? 2. What kinds of mechanisms cause traits to co-vary? To what extent does pleiotropy and linkage disequilibrium contribute? What is the distribution of the magnitude and sign of joint effects of genomic regions on pairs of traits? 3. To what extent do huge data sets help us address these questions? For the first time we have the technology to generate genome sequence data for hundreds of thousands of individuals at low cost and the computer power to store and analyse such data. The aim of this project is to harvest scientific benefits from a 15 year billion dollar pig breeding program. Our previous projects asked how statistical genomics helps animal breeding; this project asks how animal breeding helps statistical genomics.

Impact Summary

(i) The academic community. Scientifically, the project constitutes a step change in genetics research because this is the first data set of this scale with whole genome sequence data. As outlined in the section "Academic Beneficiaries" there are several benefits that will accrue to the academic community (animal, plant, human and evolutionary geneticists and other fields that develop and utilise large scale computational methods). This impact will be delivered via publication in journals, presentations at conferences, seminars, and by making data and software available. (ii) Animal breeding companies, breed societies, and levy boards. The biological insights about quantitative traits will guide these organisations in their efforts to turn genetic variance in traits into response to selection in a way that is sustainable. The quantification of the power of a huge data set combined with functional annotation will guide them in their investments in data for the coming years. The software and scripts that we will use to generate and analyse the data in this project will be made available to these organisations. (iii) The entire chain of users of pig products. The entire chain of users of pig products, including meat packers, processors, retailers and consumers will benefit because the knowledge generated will equip PIC and other pig breeding companies with tools to deliver a higher quality product, which costs less, and is more environmentally friendly, healthier and suited to individual requirements of stakeholders in the supply chain. (iv) Plant breeding organisations. The methods, data sets of this scale, and biological insights are also highly relevant to plant breeding organisations. Therefore the benefits to plant breeding organisations, in the developed and developing world, will be similar to those outlined for animal breeding companies, breed societies, and levy boards. (v) Commercial sequence and genotype providers. Companies providing SNP or sequence data will be able to open up a completely new market based on low cost provision of huge volumes sequence data. (vi) UK Treasury will benefit from increased tax revenues through increased profitability of PIC, the pork supply chain, other UK agricultural users should they adopt the method, and UK based sequence and genotype providers. (vii) UK science infrastructure and capacity. The proposed methods and data set will provide a platform for increased R&D capabilities in the UK, maintaining its scientific reputation and associated institutions, with increased capability for sustainable agricultural production. The proposed research will be embedded within training courses that the PI is regularly invited to give, and the post-docs working on the project will have the opportunity to be trained at a world-class institute in a cutting edge area of research while interacting with a leading commercial partner. (viii) Policy. Sequence data is expensive, but the research and practical benefits are potentially large. Therefore much investment will be made in sequence data in the coming years. The outcomes from this project will guide these investments. This will be particularly relevant for projects such as the Genomics England project which is spending >£300 million to sequence 100,000 individuals. (ix) Society. All members of society who work to improve or depend upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications outlined above. The application of the outcomes by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture has direct societal benefits in greater food security with less environmental impact. The knowledge will feed into educational programs.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeIndustrial Partnership Award (IPA)
terms and conditions of use (opens in new window)
export PDF file