Award details

Development of GPGPU tools for modelling complex phenotypes

ReferenceBB/K000195/1
Principal Investigator / Supervisor Professor Albert Tenesa
Co-Investigators /
Co-Supervisors
Dr Alan Gray
Institution University of Edinburgh
DepartmentThe Roslin Institute
Funding typeResearch
Value (£) 108,731
StatusCompleted
TypeResearch Grant
Start date 02/07/2012
End date 31/08/2013
Duration14 months

Abstract

In the last years, genome-wide association studies (GWAS) have allowed an unprecedented exploration for genetic variants contributing to complex traits. GWAS have genotyped thousands of human and animal genomes with very dense single nucleotide polymorphism arrays and correlated genetic variation with phenotypic variation. Despite the arguable success of GWAS for most complex traits, in reality most of the standing genetic variation remains unidentified. Although the 'missing heritability' problem is currently obvious in human studies, it is very likely that the same problem will arise in wild, farm and companion animals as data becomes available. One strategy to identify the 'missing heritability' is to fit non-additive genetic models. Fitting these models is computationally intense and we lack fast tools to perform global and unbiased searches of the genome in a reasonable amount of time. We will exploit the power of Graphics Processing Units (GPUs) to address one of the most important unanswered questions in complex traits' genetics: where is the missing genetic variation hidden?. We have developed an analytical approach to identify quantitative trait and disease susceptibility loci, i.e. to capture genetic variation at functional genomic regions. Our approach estimates genomic relationships among individuals at particular position of the genome from the observed genotypes and fits the individuals' additive genetic value at that position as a random effect in a mixed-linear model framework. However, current tools are slow and this makes global epistatic searches and obtaining empirical significance thresholds impossible using our analytical approach. We estimate that the proposed project will deliver a five-fold increase in performance over our current CPU software implementation.

Summary

We are investigating how genes make some people or animals more susceptible to certain diseases (e.g. cancer) or better at production traits (e.g. milk yield) than others. In the long term this research could be used to predict what diseases individuals and animals are prone to and what age they are likely to develop them. With this information better drugs and preventative treatments could be developed. This will also help to improve food production and safety for an increasing human population. To investigate this, we take samples from a large number of diseased or healthy people or animals. The genomes of these two groups are then studied and particular parts of the genome (called genes) pinpointed as contributing to the differences between the groups. Doing those comparisons requires complex mathematical and statistical models. We have developed statistical methods that are able to model the traits of animals and people as a function of their genetic make-up and aid us in identifying what genes are contributing to the differences between groups. However, these methods require a large number of calculations that take a long time to complete when using standard computer processors (or clusters of them). This research proposal will develop software tools that speed-up this calculations substantially and hence will help us achieve our scientific aims more quickly. The software tools will run on Graphics Processing Units (GPUs), which are the fast computing processors used in graphic cards and that allow people to play fast and fun computer games. We will use the same programming 'tricks' and technology used by the computing game industry to understand how genes work, and how they interact with each other to make people or animals more or less prone to disease.

Impact Summary

Impact on the academic community The proposed research will benefit complex traits' geneticists working in model organisms, humans and livestock, wild and companion animals. It will aid them to identify the genes and loci that code for and control complex traits and diseases. This in turn will help to understand how genes interact with each other and with the environment. Identifying the genes that contribute to particular traits (e.g. diseases) makes feasible the study of the molecular mechanisms that lead to them. Molecular biologists will be primary beneficiaries of the successful application of our tools to complex traits in humans and animals. Impact on the industry Our research will help the breeding industry to maintain a competitive advantage through improved breeding schemes. Identifying the loci contributing to production traits will help to build better prediction models and hence achieve higher genetic gains. It will also help to maintain sustainable food (protein) production and reduce the environmental burden of the livestock industry. Our tools will allow the discovery of genes associated with disease onset and progression. Mechanistic insights generated by the discovery of those genes will help the pharmaceutical industry to inform the selection of candidate chemical compounds thereby increasing the success rate of potential useful compounds and speeding-up drug discovery and development. Impact on human and animal health Predicting phenotypes is important in human disease: better prediction models will lead to better screening strategies, allocation of resources and intervention strategies, hence informing public health policy. Our methods will help to understand the genetic architecture of complex diseases in livestock and companion animals, this will help to develop better screening programmes, improve public health policies and facilitate the development of better therapeutics. Impact on users The impact on users will be tremendous; our GPU code is likely to be a hundred times faster that available software. This means that global searches for epistasis would be feasible and that empirical significance thresholds could be obtained. Both of these analyses are currently not feasible. Timescale Uptake of the software is likely to be quick because the results from genome-wide association studies have, to a degree, not fulfilled their original expectations and there is a need to try new approaches to identify the missing genetic variation.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsTechnology and Methods Development
Research PriorityTechnology Development for the Biosciences
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file