Award details

Visual Interactive Pedigree ExploreR (VIPER)

ReferenceBB/H023879/1
Principal Investigator / Supervisor Dr Andrew Law
Co-Investigators /
Co-Supervisors
Dr Trevor Paterson
Institution University of Edinburgh
DepartmentThe Roslin Institute
Funding typeResearch
Value (£) 40,981
StatusCompleted
TypeResearch Grant
Start date 01/07/2010
End date 31/12/2011
Duration18 months

Abstract

Pedigree genotype data produced from animal breeding experiments are the basis for the genetic mapping of markers and phenotypes that underpin selective breeding programmes. To be useful for such work the pedigree genotypes need to be free from error, but the size of the datasets means that some pedigree errors, mis-typings and sample mis-identifications are inevitable. Current tools for identifying such errors may show where these problems manifest, but sourcing the cause of the errors is more complex and in current text and table based tools becomes intractable, especially when multiple errors may be at work. To this end, we propose a new Information Visualisation (IV) tool to aid geneticists. IV is the use of graphical and interactive techniques to display and query abstract data sets such as pedigree genotypes. A first phase will develop a tool that shows the pedigree and associated marker data in an intuitive graphical representation and integrate it with existing back-end data cleaning algorithms to show where errors occur in a pedigree. A second phase will incorporate interactive techniques for dynamic feedback that allow geneticists to hypothesise as to the source of data errors within a pedigree and view the effect on the state of the erroneous data. For example, this may include reassigning parent animals, changing specific marker values or masking entire sets of markers. Ultimately a geneticist will be able to arrive at a set of actions that produce an error-free data set. Therefore, the outcome of this research project will be to produce a tool to allow a geneticist to interactively clean up pedigree genotype datasets. Beneficiaries will include the geneticists who produce the initial data and other specialists who will now be able to use the cleaned data set for their own analyses and research.

Summary

This project aims to produce a tool to remove errors in animal pedigree information caused by administrative and data handling faults. Large amounts of animal pedigree and characteristic data are logged and stored during the course of animal breeding studies. However, to be of any use for further programmes or analysis the data needs to be as free of error as possible. Errors in data storage such as recording the wrong father for an animal or unnoticed change in associated gene data are easy to introduce when hundreds or thousands of individual animals are being dealt with. Unfortunately while it is relatively easy to process this data to find the existence of errors, finding and correcting the cause of the errors is more difficult. For example, it isn't straightforward to know if an error is in the pedigree i.e. the child-parent relationships or in the characteristics associated with the animals. An animal may be recorded as having a certain characteristic that on examination may not be possibly inherited from its two recorded parents. So is the recording of one or both of the parents wrong, the recording of the characteristic in the child animal incorrect, or the characteristic in one of the parent animals wrong? To answer this question further examination of the problem animals' relations in the pedigree is necessary. However, in a text or spreadsheet-based document this quickly becomes tedious and confusing even when the operations to detect and show errors in the data are available. However, if we were to switch to a more graphical, user-friendly style of displaying the data then it would be easier to follow relationships in the pedigree. If we added on top the capabilities to interactively show up where errors occurred and where they could possibly be caused from we would have a way of examining the pedigree data and asking questions that would clear up or narrow down errors. Such a way of displaying and interacting with data is called Information Visualisation (IV). Unlike human family trees, most recorded animal pedigrees have a large degree of in-breeding as scientists and breeders try to encourage certain characteristics through selective breeding. This makes the drawing of animal pedigrees more complex as two individuals may end up being related through two or more routes. By extending current IV techniques for this type of data this project will make the interface less complex by interactively showing only selected individuals and their relationships. On top of this the scientists will also wish to view some display of the characteristics associated with the animals and again the complexity can be reduced by viewing only a handful of characteristics at a time. Even so, one male animal can easily sire dozens of children who are in turn related to dozens of female parents and then in turn again may have children of their own - and there may be a several characteristics at a time a scientist is interested in exploring for these animals. Methods for seamlessly moving from showing one part of a pedigree to another will be developed to help scientists explore massive pedigrees. Once an initial interface is built then a means for exploring errors by asking 'what-if' questions will be developed. Possibilities include the ability to 'mask out' problem individuals or problem characteristics to see what effect that has on the pedigree and errors, or to actually edit information and recalculate the effect on the pedigree again. The ability to redo and undo past actions will be needed and in the end the scientist will produce a set of actions that lead to a clean data set, or as close as can be achieved. Throughout the course of the project the work will be tested with scientists who use pedigree data. In the end we will produce a tool that will benefit scientists who work with pedigrees by allowing them to readily clean their data, allowing them to share it usefully with other scientists.

Impact Summary

The primary beneficiaries of this project will be the target user group for the pedigree visualisation tool (VIPER), i.e. all animal breeders and geneticists currently engaged in generating pedigree genotype data for genetic analyses for whatever purpose (e.g. the calculation of linkage associations and generation of multi-locus genetic linkage maps). As well as allowing visual exploration of pedigree genotype datasets VIPER will provide the means for cleansing genetically inconsistent genotype datapoints from the data. This data cleaning is essential prior to downstream processing in genetic analyses, and will thus enable the sharing of valid datasets between researchers. For example species resource databases such as ResSpecies require that any uploaded datasets are genetically consistent prior to submission. Animal breeding studies underpin a diverse range of research areas including research into animal and human health (e.g. inherited disorders, disease susceptibility, infection immunity and host-pathogen interactions) as well as the inheritance of other economically important traits in agriculture, such as yield and quality. As such VIPER will be of potential benefit for any animal breeding programme in agricultural, scientific and medical research and potential users will span the academic and commercial sectors. Several research groups working at and collaborating with The Roslin Institute will benefit immediately and directly, and the tool will be freely available to any researcher worldwide who wishes to use the tool for similar data cleaning tasks. To support this, the tool and documentation will be made freely available on a Project Website and through the ResSpecies website, and the research will be disseminated through presentations at appropriate international conferences and publication in scientific journals of Genetics, Genomics and Bioinformatics research. The pedigree visualisation tool, coupled with the ResSpecies genetic inference algorithm, will be applicable to genetic studies of any species exhibiting diploid inheritance and implementation of a modular design will allow for the tool to be modified to operate with other genetic systems, increasing the potential user base to all genetic research communities. The successfully implemented pedigree visualisation model itself, designed as a reusable software module, should be useful in other contexts where display of pedigree information is required. For example this could be used for visualisation of extended human pedigrees or crop breeding pedigrees. The research involved in implementing a successful interactive pedigree visualisation will contribute to the wider field Information Visualisation (IV), outwith the immediate biological domain. It will be appropriate therefore to present these results to the IV research community through conference presentations and publication in IV journals.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsAnimal Health, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file