Award details

BBR GenomeHubs - agile genome databasing for neglected organisms of agricultural, development and biodiversity importance

ReferenceBB/R015325/1
Principal Investigator / Supervisor Professor Mark Blaxter
Co-Investigators /
Co-Supervisors
Institution University of Edinburgh
DepartmentSch of Biological Sciences
Funding typeResearch
Value (£) 362,520
StatusCompleted
TypeResearch Grant
Start date 01/10/2018
End date 30/06/2019
Duration9 months

Abstract

Genome databasing is critical in ensuring that the costly results of genome scale analyses are available to the research community. Richly-featured genome database solutions are also substrates for novel research activity - aggregating data across projects and species, and asking and answering important questions. Several independent solutions are available for genome databasing. These are tailored to fit their research communities - e.g. the human and model organism communities to have rich database tools for dense data analysis. We propose to leverage investment in one of these - the Ensembl database and data visualisation system - to allow communities working on non-model, less well-resourced organisms to benefit from this high-quality toolset. We have developed routines that make the establishment and population of an Ensembl database much easier than previously, developed new visualisations and engineered a simple-to-manage data sharing/enquiry system called GenomeHubs. We now propose to develop GenomeHubs in several ways, under the guidance and feedback of the several communities we collaborate with. Specifically, we will track changes in the underlying Ensembl codebase, ensuring that GenomeHubs remain current. We will develop routines for deposition of data, collated and normalised in GenomeHubs, into the public databases of record, through the ENA. We will develop new visualisation and data interrogation toolkits that use the underpinning Ensembl database structure to extract new composite data types, build new views on the data, and federate searches across database instances. We will provide Galaxy and virtual machine instances of the GenomeHubs pipelines so that users can access them easily. We will build and support our user communities through workshops, and encourage other developers to build plugins and other developments of the GenomeHubs code.

Summary

Building the first draft of the human genome cost around £2.5 billion. New sequencing technologies mean the cost of resequencing a human has reduced over a million-fold. This reduction in cost also transforms genomics approaches to many other biological questions. Genomics is now commonly applied to diverse goals from crop and livestock improvement, through pathogen and parasite surveillance, to biodiversity assessment. Many research communities are now able to generate reference genomes for their target species, compare genomes across suites of related species and sequence many individuals of the same species to investigate how variation between genome sequences affects biology. With these benefits come the challenges of managing a deluge of data, of analysing the data to answer questions, and of making the data and results available to others. For raw sequence data deposition in "databases of record" (internationally-supported systems that collect, collate and store for posterity) is standard. However, many discoveries are based on intensively analysed data - raw sequence is "assembled" to predict the whole genome sequence, genes are predicted in this genome sequence, and their functions are inferred by a range of annotation tools. Capturing these analyses in databases of record is strongly encouraged, but is technically difficult. For a few species, researchers have developed dedicated genome exploration databases that collect and collate not only sequence but also annotation and functional data, and present it in a way that facilitates integration. These databases require considerable expertise and effort to set up, maintain and keep current with the latest scientific developments. Thus, for the majority of species, and especially species of interest to the developing world, dedicated databases do not exist and communities lack the resources to plug this gap. During a previous BBR project, we developed an approach to genome databasing, named GenomeHubs, that removes the barriers to creating and maintaining a dedicated genomics resource for any species group. We do this by greatly simplifying the process of importing data into, and hosting an instance of, the most comprehensive genome database platform, Ensembl. Using the carefully-engineered Ensembl system, we have developed tools that standardise data from diverse sources, run automated analyses, import analysis results back into the database and visualise the genome and annotations through a web interface. In this proposal we will develop GenomeHubs further to make it straightforward for researchers to run all the steps to assemble, annotate and run standard analyses on any genome or set of genomes and share these results with the wider community. We will add new analyses and visualisations and we will help users through collaboration and training in the setup and use of GenomeHubs. This application is being made in tandem with one to the BBSRC BBR Global Challenges Research Funding call, which will work with Lower and Middle Income Country (LMIC) scientists to develop and exploit GenomeHubs for their needs. Genomics is being increasingly applied to problems of the developing world, in particular improvement of crop plants and local farm animals, understanding and combating infectious disease, and biodiversity conservation. This project will work very closely with the GCRF GenomeHubs outreach project, bringing the technology to LMIC researchers and supporting their use of GenomeHubs. We will link research communities, promote data sharing and enhance the pooling of resources and understanding to solve shared problems. We will develop collaborations with key scientists in LMICs who will act as Ambassadors for GenomeHubs, and collaborate closely with LMIC researchers to develop new code, new visualisations and new analytic tools for GenomeHubs to meet their requirements.

Impact Summary

The genomics revolution is transforming the impact of genetic approaches to understanding the functioning of organisms, and exploiting genomics information is an essential component of transformation of basic knowledge into societal impact. However fragmentation of data outputs from groundbreaking projects makes synthesis difficult, and transferring inferences drawn from one species to another problematic - every finding risks becoming an anecdote and not part of a coherent narrative. This existing problem and future risk defines the areas in which we hope our GenomeHubs project will have impact. We expect GenomeHubs to have a lasting impact on the practice of genomics studies. This impact is predicated on the ease with which research communities will be able to collaborate to build portals for their high-dimensional data, and thus make it accessible for reuse and reanalysis. In the absence of the GenomeHubs system it is very likely that the current status quo - of raw data (mostly) making it to databases of record, but the majority of analysed data being lost to science, and unavailable to society - will continue to the detriment of society and science. To assure the impact of our project we propose a program of outreach and education components that will make the community aware of the problem, elicit debate about the likely solutions, and deliver training in the practice of data integration and reuse. For academic research teams, we will offer support in establishing and maintaining GenomeHubs tailored to their needs. We will build skills in gathering and interpreting community needs, and in delivering tailored solutions. We will build robust and agile systems such that researchers can establish new GenomeHub databases with the minimum of background knowledge in systems administration or bioinformatics. We will assist academics in using GenomeHubs by developing training materials and running week-long summer schools in genome assembly annotation and databasing,with support for PhD students to attend free of charge. For SME we will offer a system they can be secure in installing and running to analyse their own data, mirror from other sites to explore public data, or collaborate with academic researchers to merge datasets and reap the benefits of shared analyses. We envision impacts in the areas of pest and pathogen control (understanding the comparative genomics and systems biology of loci involved in pathogenesis, and of loci that may be targets for control strategies), in crop species improvement (both plant and animal), in understanding the effects of interventions such as chemical treatment on organisms, and in developing monitoring tools for adverse and beneficial effects of interventions. For state and NGO stakeholders, we will offer a validated source of comparative and integrated data on biodiversity, on organisms' responses to change and challenge, and a platform on which they can base policy and intervention decisions. We will also ensure best outcome for public investment in genomics, by making it more likely that funded projects deposit data openly for others to reuse. The loss of scientific "capital", in the form of inaccessible knowledge derived from raw data, is significant, and GenomeHubs embody a strong statement that this capital is valuable. Training delivered through summer schools and other methods will be accessible to researchers, students, and scientists in SME, NGO and governmental organisations. The training will show these users how to best exploit the GenomeHubs and the data they contain to promote their own agendas.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file