Award details

Establishing the infrastructure for functional annotation of farmed animal genomes

ReferenceBB/M01844X/1
Principal Investigator / Supervisor Professor Alan Archibald
Co-Investigators /
Co-Supervisors
Professor David Burt, Professor Mario Caccamo, Ms Laura Clarke, Dr Robert Davey, Professor Federica Di Palma, Dr Paul Flicek, Dr Andrew Law, Dr Timothy Stitt, Professor Michael Watson
Institution University of Edinburgh
DepartmentThe Roslin Institute
Funding typeResearch
Value (£) 1,900,845
StatusCompleted
TypeResearch Grant
Start date 01/01/2015
End date 30/06/2016
Duration18 months

Abstract

High quality, annotated reference genomes are essential for contemporary biological research. Reference genome sequences have been established for farmed animal species, including chicken, cattle, sheep, goat, pig, turkey, duck, dog, horse and most recently tilapia. However, unannotated genome sequences are not immediately useful. We have used the Ensembl system to establish high quality annotations of these genomes based on the data available at the time, mostly cDNA and comparative data from other species. However these annotations are incomplete, with transcriptomes that lack details of transcript isoforms and transcriptional start sites, and more critically the identity of regulatory regions is essentially absent from the annotation of all these genomes. We will establish the data infrastructure critical to the coordinated functional annotation of animal genomes and the success of the international "Functional Annotation of ANimal Genomes" (FAANG) project. The goals of FAANG are to provide experimental data to comprehensively annotate transcriptomes and regulatory regions in farmed animals. The infrastructure will comprise substantial hardware and compute capacity at Roslin and TGAC allied to existing hardware at EMBL-EBI together with software to enable the functional annotation of farmed animal genomes and linking genotype (sequence) to phenotype. The infrastructure will support a Data Coordination Centre (DCC) and Data Analysis Centres (DACs). DCC will store data and analyses after quality control checks. DACs will seek to minimise redundant analysis of data. We will establish standard analysis pipelines across multiple sites to ensure distributed analyses are comparable. Bioinformatics pipelines for quality control of assay by sequence data (ChIP-Seq, RNA-Seq, and Methylation studies), tools for validation of sample identity and a number of primary analyses (transcriptome and epigenetic marks and other regulatory features) will be developed.

Summary

Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in agriculture, improving animal health and welfare, contributing to medical research and increasing our understanding of natural and wild animal populations. High quality annotated genome sequences ("sequence of DNA nucleotides that make up the genetic material of an organism") provide an important framework for the discovery of genetic variation ("genotype") in that sequence linked to variation in the characteristics ("phenotype") of an animal. Genome sequences are available for many domesticated animals, including poultry (chicken, turkey and duck), livestock (cattle, pig, goat and sheep), fish (cod, tilapia and salmon) and companion animals (dog and horse). Today technology to sequence DNA is both rapid and relatively cheap and allowed the development of a wide range of assays based on generating short or long sequences. High quality, annotated genomes are critical to the analysis of these assays, including RNA-seq analysis of a cells RNA ("read off the genome, encoding information to synthesise proteins or regulate gene expression"), Methyl-seq analysis of the location of covalent modifications (methyl-groups) to the genome or ChIP-seq analysis of proteins bound to the genome that activate or repress gene expression. Identifying the functional elements within the genome that code for proteins, non-coding RNAs or regulate gene expression is essential for understanding the phenotypic consequences encoded in the genome. Annotated genome sequences are freely available on-line through Ensembl, NCBI and UCSC. We have used the Ensembl system to establish high quality annotations of animal genomes based on mostly cDNA and comparative data from other species. Whilst 70-90% of protein coding elements can be identified, there is little information on non-coding RNA genes many of which are suspected to regulate gene expression. Comparing the numberof annotated RNAs read from genes in the human, mouse and domesticated animal genomes shows that the complexity of RNAs in domesticated animals is underestimated. Even less is known of regulatory sequences, which is a significant barrier to understanding the link between genotype and phenotype. The importance of this challenge is recognised by the EU-US Animal Biotechnology Working Group, which is promoting the need for transnational coordinated functional annotation of animal genomes. Following a workshop of over 100 scientists in San Diego (2014) an international consortium the "Functional Annotation of ANimal Genomes or FAANG" was launched. The FAANG project aims to provide experimental data to comprehensively annotate domesticated animal genomes. We will establish the data infrastructure to support these goals. The infrastructure will comprise hardware and compute capacity at Roslin/TGAC/EMBL-EBI together with software to enable the functional annotation of animal genomes. The infrastructure will support a Data Coordination Centre (DCC) and Data Analysis Centres (DACs). DCC will store data and analyses from the FAANG consortium subject to quality control checks. DACs will seek to minimise redundant analysis of data. Standard bioinformatics pipelines for quality control of assay-by-sequence data (ChIP-Seq, RNA-Seq, and Methyl-seq), tools for validation of sample identity and a number of primary analysis pipelines (to map location of RNAs for protein coding/non-coding RNAs and regulatory features) will be developed. A high quality annotated genome is a key source of information and critical for contemporary research in the biological sciences. It is valuable not only to academic researchers, but also to scientists working in animal breeding, animal health and pharmaceutical industries. This project is concerned with the infrastructure for delivering high quality annotated reference genomes to enable research on economically important animals.

Impact Summary

Who will benefit? The primary beneficiaries from this proposal to establish the data infrastructure to develop the Functional Annotation of ANimal Genome (FAANG) resources will be researchers in academia and industry in the UK and throughout the world. The initial meeting of over 100 scientists in San Diego in Jan 2014, convened to discuss the FAANG concept, indicates the level of interest in the resources to be developed and managed by the FAANG consortium. Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in the animal sector of agriculture, contributing to medical research by providing animal models, improving animal health and welfare and informing understanding of natural and wild animal populations. The world's leading animal breeding companies, of which some of the largest are UK companies, have in-house genetics expertise. Thus, these companies have the expertise to exploit the information captured and disseminated through FAANG consortium. Evidence of the value of animal genomes to the pharmaceutical sector is provided by their recent investments in sequencing the pig and dog genomes, as well as the use of the chick as a developmental model underpinned by an annotated genome sequence. Suppliers of species specific 'omics tools such as expression arrays, SNP chips and proteomics systems will benefit from access to annotated genomes, with links to features (e.g. probes) on their products. There are potential indirect benefits to the wider public addressing the food security agenda, as discussed below. How will they benefit? The proposed FAANG resources will enable research to dissect the genetic control of economically important (and complex) traits in farmed animals including feed efficiency and susceptibility to infectious diseases. This enabling of genetics research in farmed animals will facilitate advanced genetic improvement for these species. For example, there will be opportunities to develop more realistic and powerful models to predict breeding values from genome-wide genotypes produced either from high density SNP panels or full/partial genome sequences. Genetic improvement of farmed animals is a key means of addressing the food security agenda. The utility of 'omics technology products, such as expression microarrays and SNP chips are greatly enhanced when the features on these products can be linked to a well-annotated genome sequence and other information sources. For example, probe sets for Affymetrix arrays and SNPs on Affymetrix and Illumina chips can be linked to annotated genes and genome locations respectively, thus enabling more effective use of these products. Well-annotated genomes facilitate the design of capture probes for exome sequencing; current developers of such products include Agilent and Roche Nimblegen. In particular, these sectors will be able to probe both the coding and the less characterised non-coding RNA genes. Academic and other researchers will benefit from the ability to link the read-out from assay by sequence assays to an annotated genome sequence. Without such a frame of reference such assays are of limited value. For example, all RNA-seq studies need a comprehensive set of gene models to assay at the level of specific transcript isoforms not just at the gene level. The impacts on research will be delivered within the timeframe of the proposed project and continue thereafter on an international scale. There will be direct impacts on staff through training and outeach activities in bioinformatics. The indirect impacts, for example, on the food security agenda and the wider public will take longer. However, the time to impact for genetic tests for susceptibility to inherited or infectious diseases in animals with their positive impacts on animal welfare can be short, within 1 to 3 years.
Committee Not funded via Committee
Research TopicsAnimal Health
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file