Award details

A lightweight genome browser for data integration, exploration and interactive figures

ReferenceBB/K015427/1
Principal Investigator / Supervisor Professor Tim Hubbard
Co-Investigators /
Co-Supervisors
Institution Wellcome Trust Sanger Institute
DepartmentInformatics
Funding typeResearch
Value (£) 85,000
StatusCompleted
TypeResearch Grant
Start date 08/04/2013
End date 07/10/2014
Duration18 months

Abstract

In the course of this project, we plan to develop the existing Dalliance genome-browser code base and documentation. Our objectives are to: 1. Maximise the value of genomic data by encouraging visualisation, exploration, and understanding. To this aim, we will provide tools which allow these data to be seen in context. 2. Facilitate genome exploration by making it easy to embed genome browser components in web pages, scientific papers, database front ends, and web interfaces for analysis tools. 3. Make it easier to customise the browser to fit the styling of its enclosing application, and to interact seamlessly with the enclosing application's user interface, increasing the breadth of ways in which users can interact with these data, and promoting experimentation in the user interface space. 4. Make it easier to discover and integrate new biological data sets using established technologies for metadata publication and sharing, encouraging the combination of data in new ways. 5. Support all the important formats that are being used to share such data. We have a particular focus on compact and efficient formats which make access to large datasets practical, and make access to genomic data more feasible to those with limited network bandwidth. 6. Maximise the performance of the browser, especially when accessing large datasets, to facilitate interactivity and data exploration.

Summary

Genome browsers are important tools for visualising, interpreting, and understanding the wealth of information now available about genome structure, its functions, and the variations that occur between individuals. In research environments, they are frequently used for visually checking the quality of new data, looking for patterns and correlations, and inspecting the results of analysis tools. Browsers are also important for education, and -- especially with the arrival of direct-to-consumer genetic testing -- may also be of interest to the general public. Historically, the most commonly-used genome browsers have been relatively static web applications, displaying a portion of the genome as a single image which must be reloaded to move or zoom the view, leaving major barriers to interactivity. The alternative is heavy-client desktop software which is more interactive but requires installation. Dalliance takes advantage of newer features of web browsers to build a fully interactive genome browsing tool as a web application. Because Dalliance does all the data integration and drawing work in code that runs within the end users' web browsers, it is relatively straightforward to add an instance to a new web page without any server-side work. The Dalliance instance can potentially be just one of many elements on a complex web page. This leads to the possibility of creating many kinds of applications which combine spatial views of genomic data with other information. We are already seeing a number of cases where academic and industrial researchers have used Dalliance as a browser component of their website, including the interpretation of DTC genetic tests (see letters of support). You can even think of Dalliance as allowing the creation of interactive figures, which can accompany database records, analysis results, or publications. We propose a program of further development work, rewriting the main rendering code to improve performance, and splitting the userinterface into components. This means embedders have the flexibility to offer either a simplified interface (e.g. more appropriate to an interactive figure), a fully-featured interface with a complete set of navigation controls tools to control the integration of additional datasets, or anything in between. We will add support for additional genomic data formats -- notably, better support for genetic variation data. We will also update the metadata model, and support additional metadata sources such as the "track hub" structures used by the UCSC browser. This will enable easy integration of existing data from the ENCODE project, and offer a simple route for sharing biological datasets with enough metadata to allow display in a genome browser with descriptions and track selection user interfaces. Finally, we plan to organise a federated data workshop to interact with developers and encourage the development of federated solutions for sharing biological data. This will be modelled on the popular DAS workshop that has been organised at Hinxton in the past, but with a broader aim of supporting and encouraging data integration in a variety of formats, notably the modern style of indexed binary data and trackhub-style metadata which allows discoverable data to be published with extremely low requirements for server hardware or administration.

Impact Summary

Genome browsers are vital tools for both academic and commercial exploitation of genomic data. The ability to visualise experimental biological datasets against existing genome annotation is a fundamental requirement for interpretation and analysis. The proposed development work will generate software that will lower the barriers to the creation of custom visualisation tools and analysis. The open availability of the software code will ensure its use is maximised. Since genome sequence based assays have become pervasive in biological research, this work has the potential to assist a large fraction of individuals who carry out biological research. This includes those working in the pharmaceutical industry where such analysis is important when using model organisms of human in drug development and in the analysis of genomic differences between human individuals as part of efforts to interpret gene disease relationships. It will directly benefit researchers developing tools that need to include a visualisation of data against genome coordinates by greatly simplifying the task of including that component. It will also enable a much wider researchers who are not specialist bioinformatics developers to construct web pages presenting their data in the form of interactive figures containing a fully functioning genome browser. As genomics become more widely adopted in medicine there will be new visualisation requirements for healthcare systems. Already direct-to-consumer genetic test companies provide browsers as part of their service. Both NHS and commercial groups developing analysis and decision support systems for doctors and patients that incorporate genomic data will similarly benefit through a simplified development path. Visualisation of biological information on a genome coordinate system has become such a fundamental principle of biology that it will become an important concept to teach in schools and as well as undergraduate classes. Simplified representationsmay also make good interactive displays for museum presentations about biology or health. Dalliance will bring the construction of genome browser interactive displays within reach of these groups by simplifying it to the level of creation of a stand alone web page. All these groups will benefit from the availability of Dalliance through the popular Github platform, making it easy for third-party developers to follow development, stay up to date, and -- if they wish -- submit their own changes for integration back into the main branch of development. Dalliance is released under a BSD-style license which means that it can be freely embedded into other systems, regardless of the licensing model of the enclosing application. Because Dalliance is an open source project, individual components can be used in other contexts. This will benefit all these groups by encouraging collaboration and code-sharing. To maximise the impact of this project, it is vital to engage developers from all sectors that might benefit from including genomics support in their applications. As well as engaging users through journal, conference, and web presence with extensive documentation, we plan to organise a federated data workshop to interact with developers and encourage the development of federated solutions for biological data. We see this workshop as a valuable opportunity to meet existing embedders, discuss new projects, and strengthen the community that is growing up around these technologies.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file