Award details

Visual Interactive Pedigree ExploreR (VIPER)

ReferenceBB/H023909/1
Principal Investigator / Supervisor Professor Jessie Kennedy
Co-Investigators /
Co-Supervisors
Dr Martin Graham
Institution Edinburgh Napier University
DepartmentComputing
Funding typeResearch
Value (£) 79,656
StatusCompleted
TypeResearch Grant
Start date 01/08/2010
End date 31/01/2012
Duration18 months

Abstract

Pedigree genotype data produced from animal breeding experiments are the basis for the genetic mapping of markers and phenotypes that underpin selective breeding programmes. To be useful for such work the pedigree genotypes need to be free from error, but the size of the datasets means that some pedigree errors, mis-typings and sample mis-identifications are inevitable. Current tools for identifying such errors may show where these problems manifest, but sourcing the cause of the errors is more complex and in current text and table based tools becomes intractable, especially when multiple errors may be at work. To this end, we propose a new Information Visualisation (IV) tool to aid geneticists. IV is the use of graphical and interactive techniques to display and query abstract data sets such as pedigree genotypes. A first phase will develop a tool that shows the pedigree and associated marker data in an intuitive graphical representation and integrate it with existing back-end data cleaning algorithms to show where errors occur in a pedigree. A second phase will incorporate interactive techniques for dynamic feedback that allow geneticists to hypothesise as to the source of data errors within a pedigree and view the effect on the state of the erroneous data. For example, this may include reassigning parent animals, changing specific marker values or masking entire sets of markers. Ultimately a geneticist will be able to arrive at a set of actions that produce an error-free data set. Therefore, the outcome of this research project will be to produce a tool to allow a geneticist to interactively clean up pedigree genotype datasets. Beneficiaries will include the geneticists who produce the initial data and other specialists who will now be able to use the cleaned data set for their own analyses and research.

Summary

This project aims to produce a tool to remove errors in animal pedigree information caused by administrative and data handling faults. Large amounts of animal pedigree and characteristic data are logged and stored during the course of animal breeding studies. However, to be of any use for further programmes or analysis the data needs to be as free of error as possible. Errors in data storage such as recording the wrong father for an animal or unnoticed change in associated gene data are easy to introduce when hundreds or thousands of individual animals are being dealt with. Unfortunately while it is relatively easy to process this data to find the existence of errors, finding and correcting the cause of the errors is more difficult. For example, it isn't straightforward to know if an error is in the pedigree i.e. the child-parent relationships or in the characteristics associated with the animals. An animal may be recorded as having a certain characteristic that on examination may not be possibly inherited from its two recorded parents. So is the recording of one or both of the parents wrong, the recording of the characteristic in the child animal incorrect, or the characteristic in one of the parent animals wrong? To answer this question further examination of the problem animals' relations in the pedigree is necessary. However, in a text or spreadsheet-based document this quickly becomes tedious and confusing even when the operations to detect and show errors in the data are available. However, if we were to switch to a more graphical, user-friendly style of displaying the data then it would be easier to follow relationships in the pedigree. If we added on top the capabilities to interactively show up where errors occurred and where they could possibly be caused from we would have a way of examining the pedigree data and asking questions that would clear up or narrow down errors. Such a way of displaying and interacting with data is called Information Visualisation (IV). Unlike human family trees, most recorded animal pedigrees have a large degree of in-breeding as scientists and breeders try to encourage certain characteristics through selective breeding. This makes the drawing of animal pedigrees more complex as two individuals may end up being related through two or more routes. By extending current IV techniques for this type of data this project will make the interface less complex by interactively showing only selected individuals and their relationships. On top of this the scientists will also wish to view some display of the characteristics associated with the animals and again the complexity can be reduced by viewing only a handful of characteristics at a time. Even so, one male animal can easily sire dozens of children who are in turn related to dozens of female parents and then in turn again may have children of their own - and there may be a several characteristics at a time a scientist is interested in exploring for these animals. Methods for seamlessly moving from showing one part of a pedigree to another will be developed to help scientists explore massive pedigrees. Once an initial interface is built then a means for exploring errors by asking 'what-if' questions will be developed. Possibilities include the ability to 'mask out' problem individuals or problem characteristics to see what effect that has on the pedigree and errors, or to actually edit information and recalculate the effect on the pedigree again. The ability to redo and undo past actions will be needed and in the end the scientist will produce a set of actions that lead to a clean data set, or as close as can be achieved. Throughout the course of the project the work will be tested with scientists who use pedigree data. In the end we will produce a tool that will benefit scientists who work with pedigrees by allowing them to readily clean their data, allowing them to share it usefully with other scientists.

Impact Summary

The main beneficiaries of this project will be geneticists dealing with large pedigree genotype datasets, who will have a useful tool that will enable them to interactively find and eliminate the cause of error in their data. Previously, errors in such data have reduced its usability and shareability between researchers, and weeding them out is laborious, difficult and time-consuming work. Geneticists can be found in a wide range of establishments, both publicly and privately-funded, ranging from research institutes such as Roslin to commercial animal breeding concerns that generate and turnover millions of pounds through livestock yield and quality improvements. Whilst the target of the research is currently animal pedigrees, researchers in other domains with breeding programmes have shown interest in the planned results of VIPER, for example the Scottish Crop Research Institute (SCRI) at Dundee. Further beneficiaries include user interface developers who will gain new techniques for visualising complex pedigree style structures; we envisage that one of the benefits could be improved data representation for genealogy software that allows members of the general public to explore their family trees and the associated inherited characteristics in families. From a public policy perspective, the need to develop data checking and cleaning tools and mechanisms for data submitted to public data repositories could be demonstrated by the use and success of this tool. This is especially important as such repositories become centralised locations for sourcing research data. The primary means of communicating outputs from the research will be through a project website to enable interested parties to download usable versions of the prototype pedigree cleaning tool along with requisite instructions. Publications such as journals and conferences in the bioinformatics and visualisation areas will be the appropriate conduits to disseminate research findings. Geneticists will be contacted to volunteer as testers of the software, both formally and informally, as development of the tool must be responsive to the needs of those most likely to utilise it. Other fora such as public and industrial outreach events are also available, such as university open days and student recruitment fairs where visualisation based tools make attractive visual talking points. Edinburgh Napier University is also a member of SICSA, a collection of Scottish computer science departments that aims to publicise and commercialise research where we frequently present our work. The Scottish Bioinformatics Forum (SBF) is also an avenue for exposing research to other interested researchers, and Edinburgh Napier has conducted workshops here in the past. It is expected that the software will be released under an open-source licence; however this still leaves scope for licensing and producing bespoke versions of the application with enhanced or tailored capabilities for interested parties in future. Both Edinburgh Napier and Roslin's parent institute, the University of Edinburgh, have full-time commercialisation teams that would be used in this case to advise on suitable licensing and contracting terms. The Edinburgh Napier PI has experience of working on commercialising software, supervising a Proof of Concept award for micro-array analysis and visualisation.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsAnimal Health, Technology and Methods Development
Research PriorityTechnology Development for the Biosciences
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file