Award details

A general method for the imputation of genomic data in crop species

ReferenceBB/R002061/1
Principal Investigator / Supervisor Dr John Hickey
Co-Investigators /
Co-Supervisors
Institution University of Edinburgh
DepartmentThe Roslin Institute
Funding typeResearch
Value (£) 316,259
StatusCompleted
TypeResearch Grant
Start date 01/10/2017
End date 30/09/2020
Duration36 months

Abstract

The project will develop and test a toolkit to impute dense genomic information in diploid crop breeding populations. Dense genomic information allows geneticists to unravel the genetics of traits using genome wide association studies, and breeders to speed up genetic improvement using genomic selection and genomics assisted breeding. The project will develop a hybrid imputation algorithm that combines heuristics to exploit the information in crop pedigrees with improvements to an existing HMM-based imputation algorithm, PlantImpute. 1. We will develop heuristic imputation algorithms that exploit the information in crop pedigrees, that correct pedigree errors and that generate approximate physical maps of the genome. 2. We will develop HMM algorithms that integrate with the heuristics to produce a hybrid imputation algorithm for crops that combines the speed of heuristic algorithms with the flexibility and robustness of HMM models. 3. We will package the software apply it to a number of specific case study datasets and breeding programs in KWS. The developed algorithms will be implemented in a single stand-alone software package for a range of OS environments that can be run either locally or remotely. The method will be tested on a range of species and scenarios. It will represent the first generic imputation method developed specifically for crops. We will evaluate the algorithm on a range of real data sets from the KWS breeding programs (wheat, maize, sugarbeet) with a range of designs and genotyping technologies.

Summary

The project will develop and test a toolkit to impute dense genomic information in crop breeding populations. Dense genomic information allows geneticists to unravel the genetics of traits using genome wide association studies, and breeders to speed up genetic improvement using genomic selection and genomics assisted breeding. These methods are most powerful when the density of genomic information is very high and the numbers of individuals genotyped are very large but the cost of collecting genotype information to build such datasets is prohibitive. A flexible and effective imputation toolkit will make it possible to build such datasets cheaply using imputed data. In a genetics and genomics context, imputation is the prediction of an unknown genotype in one individual from the known genotypes of other individuals (to give a trivial example, if individuals 'X' and 'Y' are known to have genotypes AA and CC respectively, then their offspring 'Z' is imputed to be AC). The value of imputation is that when combined with high-density genotype information from a few individuals, high-density information can be imputed for many individuals that have been genotyped at low-density, which vastly reduces the costs of datasets of dense genomic information. The project has three parts:- 1. We will develop heuristic imputation algorithms that exploit the information in crop pedigrees, that correct pedigree errors and that generate approximate physical maps of the genome. Existing heuristic imputation algorithms, which were designed for livestock, do not work on crops because crop pedigrees are more complex than livestock pedigrees and crop data are of many different types, whereas livestock data is fairly homogeneous in type. 2. We will develop probabilistic algorithms that integrate with the heuristic algorithms to produce a hybrid imputation algorithm for crops that combines the speed of heuristic algorithms with the flexibility and robustness of probabilistic algorithms. Existing probabilistic algorithms are too slow and require too much memory to work well with crop data. 3. We will package the software apply it to a number of specific case datasets and breeding programs in KWS, which is one of the worlds four leading crop-breeding companies.

Impact Summary

Despite the dramatic reduction in the costs of high density SNP genotyping platforms and in the cost per nucleotide of NGS based sequencing and re-sequencing application in crop plants, imputation has a major role to play in the development of cost-effective genotyping and sequencing strategies and in error correction. This project will develop a practical tool enabling genotype imputation in a wide variety of crops and a wide variety of scenarios opening up the potential for generating significant volumes of genomic information at low cost. It will develop fundamental scientific knowledge primarily in bioinformatics applied to genomics. The outcomes will be beneficial for: (i) The academic community. Scientifically, the project constitutes novel imputation and map building methods that will be suited to a wide variety of scenarios in crops. This will enable the generation of large volumes of genomic information at low cost and will have the flexibility to handle different types of genomic information. This will enable larger and hence more powerful experiments than currently feasible, and greater ability to combine data obtained with old technologies with those with new technologies. The direct application of the method will benefit researchers in plant genetics. Methodological developments will benefit human, animal, and evolutionary geneticists concerned with imputation. Major research efforts in crops are continuing to develop effective genotyping and genome reduction technologies (e.g. SNP or GBS platforms). The imputation algorithm developed in this proposal will complement and add value to these efforts. (ii) Breeding companies and organisations, and levy boards. As indicated by the attached letters of support the crop breeding industry, both in the UK and internationally, will benefit directly. In particular the developed methods will be key to the cost-effective implementation of Genomic Selection based breeding strategies in UK breeding programmes and help sustain the long-term viability of the UK breeding industry. (iii) Commercial sequence and genotype providers. Companies providing SNP, GBS, or sequence data will be able to use imputation to add value to the data that they generate. (iv) Society. All members of society who work to improve or depend upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications outlined above. The application of the algorithm by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture have direct societal benefits in greater food security with less environmental impact. (v) UK science base. The proposed algorithm will provide a platform for increased R&D capabilities in the area of imputation and plant breeding and genetics in the UK, maintaining its scientific reputation and associated institutions, with increased capability for sustainable agricultural production. By underpinning the cost effectiveness of marker based crop genetic experimental studies it will help ensure that UK funding agencies obtain maximum value from their research investment and that the studies they support will have optimised power. (vi) Training. The proposed research will be embedded within training courses that the PI is regularly invited to give, and the post-doc working on the project will have the opportunity to be trained at a world-class institute in a cutting edge area of research. (vii) Policy. Genomic data is expensive, but the research and practical benefits are potentially large. Therefore much investment will be made in genomic data in the crops sector in the coming years. To maximise efficiency of investment a co-ordinated national and perhaps international effort may be needed. The method to be developed in this proposal could enhance and underpin such an effort.
Committee Research Committee B (Plants, microbes, food & sustainability)
Research TopicsCrop Science, Plant Science, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative LINK: Responsive Mode [2010-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file