Award details

The Animal Functional Genomics Resource

ReferenceBB/N019563/1
Principal Investigator / Supervisor Dr Paul Flicek
Co-Investigators /
Co-Supervisors
Ms Laura Clarke
Institution EMBL - European Bioinformatics Institute
DepartmentVertebrate Genomics
Funding typeResearch
Value (£) 543,722
StatusCompleted
TypeResearch Grant
Start date 01/11/2016
End date 31/10/2020
Duration48 months

Abstract

The major goal of the proposed Animal Functional Genomics Resource (AFGR) is to maximize the usefulness of publicly archived functional genomic data for farmed and companion animal species in DNA databases. The AFGR will index all the animal functional data available in the EMBL-EBI's European Nucleotide Archive (ENA). Experiments that meet our metadata standards will be processed through our standard analysis pipelines for RNA-Seq, ChIP-Seq and Methylation analysis. AFGR will use these pipelines to analyse and perform quality control (QC) on the data. Results that pass our QC metrics will then be distributed to the community and where appropriate passed to Ensembl to improve the gene and regulatory annotation of genomes. We are involved in the FAANG metadata and data sharing committee (MDS), which is developing standards for sample, experimental and analysis metadata inside GitHub (records versions), to ensure the minimal metadata needed for data analysis are recorded in a well-structured manner. These standards use ontologies such as Uberon, the Cell Type Ontology and the Animal Trait Ontology for Livestock to provide specific descriptions for different sample and experimental attributes. To ensure that the data being collected are useful to downstream analysis, we will establish stringent quality metrics to filter out anomalous datasets. These quality metrics will be generated as part of our standard analysis pipelines and presented to the community through both the FTP site and data portal, so that users can browse and filter data based on different QC criteria. All the metadata stored will be fully indexed to allow for complete searching. We will also build views on the data, ensuring that users can easily browse the raw data and analysis files. This browser will also have a RESTful API, allowing programmatic access to the same metadata, enabling bulk queries and for other groups to build services on top of our data.

Summary

Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in the animal sector of agriculture (animal breeding and animal health), contributing to human and veterinary medicine by providing animal models, and improving animal health and welfare. The chicken also serves as a model for all other avian species, so is important in the fields of embryology and development, neurobiology and behaviour, and the ecology and evolution of natural populations. The genome is the entire DNA content of an organism. For the genome sequence to be useful, the sequence needs to be annotated with the location of genes and their regulatory elements along the DNA sequence. Information on the location of at least coding genes (that is, genes which make proteins) is now available for many economically important farmed and companion animals, and efforts are underway in many parts of the world to increase our knowledge of non-coding regions (i.e. non-coding RNAs and regulatory elements). In addition, the advent of new DNA sequencing technologies now mean that it is possible to sequence many individuals and compare their differences in DNA sequence and gene content and relate this to their physiological differences. It is also possible to generate 'functional' data by sequencing ("assay by sequencing"). Functional sequence data tells us about which genes are active, which genes code for proteins and which genes code for regulatory RNAs. It can also tell us about other features within the DNA sequence that are responsible for genes being switched on or off, for example, in specific tissues or in response to signaling molecules. Functional sequence data is therefore very important in informing us about how differences in the DNA sequence in individuals can affect gene activity and are therefore likely to affect phenotypes, such as production or disease resistance traits. Some DNA databases contain functional data for farmed andcompanion animals often in its raw form (e.g. sequence reads), however this data is most useful when it has been checked for quality and processed further, for example assigning it to specific genes and transcripts. It would also be preferable if more data were submitted to these databases from the research community around the world. Our proposed research aims to look at the data that is available for these animals in the public DNA databases, and check it for quality. We will also work to ensure that future datasets that are submitted to the databases have as much useful information associated with them as possible, for example, breed, sex and tissue type, etc. We will also define quality standards for such data and improve data discoverability by drawing together datasets from disparate projects into a cohesive collection that is accessible both programmatically and via a website.

Impact Summary

WHO WILL BENEFIT FROM THIS RESEARCH? The immediate and direct beneficiaries will include scientists engaged in the international collaborative efforts to characterise the genomes of domesticated animals, in particular members of the Functional Annotation of Animal Genomes (FAANG), Genome10K and numerous target species (chicken, pig, sheep, cattle, etc.) genome consortia. The integrated, functional genome resources will also be useful to a wide range of scientists engaged in research on domesticated animals in agricultural, biomedical or animal health and welfare contexts. The resources will also benefit scientists engaged in characterising the human and other genomes by providing access to high quality functional genomics datasets for a number of vertebrates and associated software tools. The animal breeding and animal health sectors will also benefit from the project outputs through increased knowledge and access to new tools and resources. More generally, the research will benefit scientists concerned with understanding the regulation of gene expression and the genetic determinants of phenotypes. These data will also be of value to other international consortia including the ENCODE, Epigenome Roadmap, FANTOM, International Human Epigenome and Blueprint Consortia for comparative analyses. HOW WILL THEY BENEFIT FROM THIS RESEARCH? The project will benefit the international FAANG and numerous target species (chicken, pig, sheep, cattle, etc.) genome consortia by developing a set of computational tools and resources to access functional genomics datasets. In particular, the project / resources will meet the FAANG project's need for a Data Coordination Centre. These new tools and resources will facilitate the detailed annotation of these genomes, such as defining transcripts, genes and regulatory regions. These annotations in turn will allow researchers and industry to annotate genome variation within specific populations of targets species to uncover causal relationships from sequence to molecular phenotype to macro-phenotype. Computational solutions and resources will provide (i) access to available functional genomics datasets generated on farmed and companion animal species; (ii) together with high quality metadata facilitating further analysis of these data; (iii) quality metrics will be made available for these data to enable and encourage researchers to reuse these datasets to answer new biological questions; and (iv) finally, diverse functional genomics datasets will be assembled into a single resource for either direct download (raw data, metadata and QC metrics) from dedicated ftp sites or filtered from using a data portal to search for specific subsets of data. This project will also provide training within and between collaborating labs to improve skills and foster new ways of working between biologists and computational scientists. Training and outreach will be core activities within the proposed project thus facilitating the delivery of impact. We will develop training materials and organise training workshops for postgraduate students, early stage and experienced researchers. Our outreach activities will encompass not only scientific conferences, but also events targeted at industry, e.g. through the Knowledge Transfer Network and directly with our industrial collaborators.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file