Award details

Visual Exploration of Species-referenced Repositories (VESpeR)

ReferenceBB/K004115/1
Principal Investigator / Supervisor Professor Jessie Kennedy
Co-Investigators /
Co-Supervisors
Dr Martin Graham
Institution Edinburgh Napier University
DepartmentComputing
Funding typeResearch
Value (£) 106,904
StatusCompleted
TypeResearch Grant
Start date 01/08/2012
End date 31/01/2014
Duration18 months

Abstract

Visualisation techniques have been recognised as one of the major directions in future research when handling and querying biological data, offering the ability to find patterns and outliers in data which traditional query interfaces cannot match. A case in point is the multitude of species-referenced databases covering data from genomic to biodiversity data linked by taxonomic classifications that hold geographic and temporal-faceted data alongside other data. Many online databases hold collections of such data, often in archive format, but visual querying tools are invariably limited to a map interface of spatial distribution, neglecting the fact that biologists may wish to query or explore other facets of the data such as the classification or temporal distribution. Add onto this the problem of many complementary databases using different taxonomic classifications to reference their specimens and we have a situation where much of the potential utility of this data remains unused. We therefore propose to develop a suite of web-based visualisation components for taxonomic, temporal and geographic aspects of these data sets that can be placed directly into the workflow of biologists who use such data. These components will be co-ordinated such that selections and actions in one component will be reflected in the data shown in other components. Further we will build a novel cross-taxonomy viewer that will allow users to crosswalk different classifications, allowing them to accurately match specimens between data from different sources. These components will allow biologists to perform tasks such as sanity checking of data, view patterns in geographical, taxonomic or temporal aspects in an inter-related context, and accurately view data even when it spans conflicting taxonomic classifications. This work will thus make a significant contribution to the efficiency and usability of online catalogues for both the providers and end-users of the data they hold.

Summary

There exist today a multitude of biological databases containing a wide array of information regarding different species including specimens in museum databases, occurrence information, genome sequence and expression data and image data to name a few. A common feature of these databases is that the information normally corresponds to a particular species (or taxa) and therefore the databases tend to employ some taxonomy to structure the information and access the data. However as yet there is no common taxonomy which is used across these databases to enable reliable linking across the databases. Matching species across databases is challenging. Different databases can and do use different classifications which use different names to represent the same underlying species or taxa. Tools that aid integration of data from these sources will be of benefit to biologists allowing them to incorporate additional data into their analyses and ensure the quality of the data and the accuracy of results are improved. The utility of visualising data is well established for tasks such as presentation of information. Visualisations are effective for a range of other tasks such as acting as ad-hoc error-checks for data e.g. spotting a record of a lion placed in the middle of the Pacific Ocean in a geographical information visualisation plot clearly suggests an error in the data. However, the true advantage of visualisation isn't in static presentation but in allowing users to interactively explore and view the effects of changes to constraints and variables, although suitable tools are frequently not available to biologists where they could be most useful. This project will build on the biological standards developed for taxonomic information and develop a set of web-based visualisation tools for use by a wide range of biologists and end-users of these databases to support them clean, explore and compare the data contained within. The resulting tools will have a wide ranging impact on the quality of data made available and the accessibility of the data to a wide range of users.

Impact Summary

VESpeR's work will impact individuals and groups who supply and utilise the data stored within large online species resource databases. The advantages we claim for visualisation are as a presentation and communication medium, for error-checking, and for knowledge discovery. Used as a presentation medium, the main non-academic beneficiaries of these tools will be the users of species referenced databases such as GBIF, Catalogue of Life, Barcode of Life and EMBL database to whom this data will be communicated through graphical visualisations. Information Visualisation is becoming common place and is now used by the public at large e.g. IV techniques frequently used in financial websites to track shares or as used to communicate voting results. The IV techniques developed in this project will add to the range of user interface techniques available for communicating and exploring information. In this way it can be argued that visualisation is the channel of communication through which data is presented to any and all possible users of the datasets we are targeting, and thus contributes to an increasing public awareness of species related information including biodiversity and its associated effects. Projects with a large proportion of visualisation work such as VESpeR also make attractive visual material for public engagement at exhibitions and open days. Such projects are also interesting to undergraduate and Masters students looking for rewarding projects to undertake and increase training in this important area. In order to engage business, we participate in outreach events, for example those organised through the Scottish Informatics and Computer Science Alliance (SICSA), of which Edinburgh Napier is a member. These events attract many delegates from industry looking for potential collaborative ventures with research from academia. We have attended all of these events which have attracted commercial interest through our posters and demonstrations. We also planto disseminate the research described in this proposal at similar bioinformatics events such as VIZBI an annual meeting organised by biologist to promote visualisation to the biological community at which Prof. Kennedy is this year's Keynote speaker. Similarly, in late 2010 Kennedy was a speaker at the BBSRC/AHRC Workshop on 'The challenges of Visualising Biological Data' held in Bristol. The work will strengthen existing links between Edinburgh Napier and the partners and supporting institutes, specifically GBIF in Copenhagen and Reading University, who hope to deploy the visualisations resulting from this project. The PI will be responsible for building the network of collaborators to continue to build new relationships and form new partnerships. In addition the proposed visualisation tools will enhance collaboration between existing providers of species resource databases such as those in the i4Life project to allow them to more easily understand the overlap and differences in content of their repositories thereby improving the provision for the end-users of these databases. Using the visualisation as an error-checking medium will allow cleaner and more precise data sets to be stored in the databases and thus reduce the potential for error in onwards analyses. Biodiversity data is used for a wide range of non-academic purposes such as conservation planning, eco-tourism, public outreach, infrastructure planning and land management and planning processes and it is only logical that less errors in the data will lead to less errors in subsequent decision-making on such issues. Similarly, it is also in these fields where the impact of any knowledge discovery made using the visualisations may result in statutory policy for example in biodiversity, which ends up affecting the general public as a whole. The tools developed will allow better exploitation of data across these different repositories by helping reconcile species references.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file