Award details

DanioPeaks: A Central Resource for Standardised Annotation and Re-annotation of Whole-Genome Data for the Model Vertebrate Zebrafish

ReferenceBB/N023358/1
Principal Investigator / Supervisor Professor Boris Lenhard
Co-Investigators /
Co-Supervisors
Professor Ferenc Mueller, Dr Fiona Wardle
Institution Imperial College London
DepartmentInstitute of Clinical Sciences
Funding typeResearch
Value (£) 139,933
StatusCompleted
TypeResearch Grant
Start date 07/02/2017
End date 06/02/2018
Duration12 months

Abstract

DanioPeaks project consists of two main components: first, it generates a computational pipeline for collecting, reprocessing of all published zebrafish next generation sequencing based epigenome datasets totalling 16.2TB; second, it consist of coordinated activity of 3 UK laboratories to manage the pipeline and integrate this pipeline together with two activities of international consortium efforts such as ZENCODE-ITN and DANIO-CODE consortia, which aims to standardise zebrafish epigenomics efforts. DanioPeaks will retrieve all published zebrafish epigenome datasets into DANIO-CODE data coordination centre. DanioPeaks will then outsource processing of the data to DNAnexus which uses the ENCODE processing pipelines. The pipelines include remapping of all zebrafish data. Next, uniform presentation of processed data from dozens of laboratories will be carried out and hosted at Imperial College London as track hubs. Finally, the tracks will be made publicly available and mirrored at the DANIO-CODE track hub by ZFIN (the widely used zebrafish Information Resource Center) at the University of Oregon. investigators will also initiate networking of PIs of the associated consortia, manage network meetings and publicise DanioPeaks activitites to the zebrafish and broader genomics communicties. As a result of the DanioPeaks activity, zebrafish epigenome datasets will be freely and conveniently available for various genome browsers including Ensembl, UCSC Zenbu etc. for over 1000 zebrafish laboratories worldwide, over 60 zebrafish labs in the UK, and for cross-species analysis. The outcome of the project are the processed datasets and bioinformatics pipeline for future data generation and submission, and represent an important resource for comparative genomics experts, for human geneticists seeking zebrafish models for disease and for toxicologists and epigeneticists.

Summary

We address the scientific demand that stems from recent developments in high-throughput genomics, including the landmark ENCODE project and the new 100K genomes project (UK): the need for a suitable vertebrate model that enables high-throughput in vivo functional testing of hypotheses generated from genome-scale annotation projects. With its abundantly available, transparent and externally developing embryos and larvae, large biomass that is crucial for high-throughput methods, fast assays of gene loss of function, a reference genome sequence, and thousands of genetic mutants, zebrafish is one of the best models for studying the structure and function of genomes in vertebrate development and disease. However, zebrafish will not be able to fulfill its potential unless its genome is comprehensively annotated for functional coding and non-coding elements, similarly to human and mouse. DanioPeaks addresses this problem by developing a bioinformatic annotation and re-annotation pipeline and providing a genomics resource for the wider genomics community. DanioPeaks aims to develop the processing pipeline for analysis of all published NGS sequencing datasets (over ten thousand NGS sequencing datasets) available for zebrafish by using established standardised protocols of ENCODE and modENCODE. It will provide the means to secure the computational power and analysis tools for remapping and reanalysing up to 16.2 TB of NGS experiment data to the most recent (final) version of the zebrafish genome sequence and to make these data comparable and available for metaanalysis to the wider scientific community. DanioPeaks will collect all zebrafish NGS raw data to a single database by upload using the zebrafish Data Coordination Centre. Raw data will be processed by ENCODE processing pipeline and mapped to GRZc10 genome assembly. Secondary analysis for feature/peak calling will be carried out and submitted to ZFIN-based track hub for visualisation in gene browsers (e.g. Ensembl). The outcome will be a community repository, a publicly accessible epigenome resource and a multicenter genome resource paper with new biology identified from the reanalyzed zebrafish data in a major genomics journal.

Impact Summary

In addition to the academic beneficiaries described in the previous section, the implementation of this project and research generated from it will also have a wider impact on society and patient groups in the longer term. These beneficiaries include: 1. Patient groups The research in this proposal will be fed into research programmes that identify disease loci. Recent GWA and other studies suggest that the majority of disease causing SNPs are located in genomic regulatory regions, but to date these have been understudied and very few have been verified functionally. The use of zebrafish could change this, as the only vertebrate model with high throughput capabilities for screening regulatory function of these regions. The identification of such functional elements will be of benefit since it will lead to better diagnostic tests and potentially therapies . 2. The wider public The wider public, and in particular schoolchildren, will benefit from the work in this proposal and the activities of the staff employed on it. In collaboration with Imperial Public Engagement team the PIs will hold a workshop for secondary school children in West London and their teachers. These events have the potential to inspire children to study science at A level and University and apply this knowledge in a wide range of STEM careers that enhance the UK's knowledge economy and global competiveness in the longer term.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file