Award details

Ensembl - adding value to animal genomes through high quality annotation

ReferenceBB/S020152/1
Principal Investigator / Supervisor Dr Peter Harrison
Co-Investigators /
Co-Supervisors
Dr Paul Flicek, Dr Fergal Martin
Institution EMBL - European Bioinformatics Institute
DepartmentGenome Assembly and Annotation
Funding typeResearch
Value (£) 378,425
StatusCompleted
TypeResearch Grant
Start date 01/08/2019
End date 31/07/2022
Duration36 months

Abstract

High quality annotated genomes are essential resources for life sciences research. Draft reference genome sequences have been established for several farmed and domesticated animals: cattle, goat, pig, sheep; chicken, duck, turkey; dog, horse; rainbow trout, salmon, tilapia. Substantially improved genome assemblies have been established for goat, pig, cattle, sheep, water buffalo, chicken) using long read sequencing technologies. There are gaps in the annotation of these genomes in terms of transcript complexity, non-coding genes, pseudogenes and regulatory sequences. Moreover, the pseudo haploid genome sequence of one individual provides an incomplete view of a species' genome. Scientists are generating more and better genome sequences for additional species and individuals within a species. Researchers, especially in the FAANG and FAASG consortia are generating functional data for annotation of coding, non-coding and regulatory sequences. We will analyse and annotate farmed and domesticated animal genomes as they are released, exploiting the growing volumes of functional data (short and long read RNA-seq / transcript sequences; ChIP-seq; ATAC-Seq; CAGE; bisulfite sequence) to identify coding genes, non-coding genes and regulatory sequences. We will acquire data from re-sequencing projects to characterise genetic variation within species (SNPs, indel, structural variants) and display this variation in its genomics context. We will run comparative genomics analyses both between species and within species. We will disseminate the resulting richly annotated genome sequences freely via the Ensembl Genome Browser and via an API for power users. These annotated genomes will provide an integrated view of functional sequences (coding, non-coding and regulatory) and sequence variation for a single or multiple individuals for key farmed and domesticated animals. To maximise use of this resource we will provide demonstrations, on-line and face-to-face training.

Summary

This project will deliver high quality up-to-date annotated genomes for key farmed and domesticated animals to enable research on these economically and socially important species. Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in the animal sector of agriculture, contributing to medical research by providing animal models, improving animal health and welfare and informing understanding of natural and wild animal populations. High quality annotated genome sequences are key resources to enable such research. The sequence of almost all genes (a reference genome sequence) has been determined for major farmed and domesticated animal species such as cattle, goats, sheep, pigs, chickens, ducks, turkeys, dogs and horses as well as for several important fish species, including cod, rainbow trout, salmon and tilapia. However, the strings of billions of bases (symbolised as four letters A, C, G, T) that constitute these genome sequences are not particularly useful or understandable on their own. Once a genome has been sequenced, it needs to be 'annotated' (i.e. explanatory notes need to be added to identify key features within the genome sequence) in order for research scientists to make sense of it. Annotating reference genome sequences with features such as where the coding and regulatory parts of genes are located, and the bases which differ between individuals within a species (genetic variants) greatly enhances the value and utility of the genome sequence. Visualising the genome sequences complete with annotations in a freely accessible manner further improves the value of the information. Ensembl provides a means for researchers to look at or 'browse' the annotated genome information. The databases and tools provided by Ensembl have been shown to be a powerful and effective means of annotating the complex genomes of animal species including humans, mice and more recently farmed and domesticated animals. Enabled by advances in genome sequencing technologies and associated computational methods scientists around the world are generating more and better genome sequences. As the genome sequence of a single individual does not completely represent the genetic make-up of a species, scientists are also sequencing multiple individuals within a species. Individual research groups and international consortia are also generating sequence information that can be used in the annotation and analysis pipelines that we will run to identify both coding and regulatory sequences. We will use these data to annotate the genomes of farmed and domesticated animals, including aquaculture species. We will run comparative analyses to compare genomes both between species and between individuals within a species. These richly annotated genome sequences, which are in effect maps of where the coding gene content and regulatory sequences are located, will be made freely available to the scientific community and others via the Ensembl Genome Browser mounted on the World Wide Web as well as via an Application Programming Interface for power users. We will also provide between and within species comparative views. The annotated genomes that we will deliver are valuable not only to academic researchers, but also to scientists working in industry, including those in the animal breeding, animal health and pharmaceutical sectors. Keeping this information up-to-date, by characterising new genome sequences and integrating new data as it becomes available, is essential for reference genome sequences to remain current and useful.

Impact Summary

Who will benefit? We anticipate the beneficiaries of the Ensembl genome portal for farmed and companion animals to be: (i) the academic research community The primary beneficiaries from this proposal for development and maintenance of Ensembl resources for farmed and companion animals will be researchers in academia and industry in the UK and across the globe. (ii) animal breeding companies The world's leading animal breeding and aquaculture breeding companies, of which some of the largest are UK companies, have in-house genetics expertise. Thus, these companies have the expertise to exploit the information captured and disseminated through Ensembl resources. (iii) owners of farm and companion animals and other stakeholders Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in veterinary research and agriculture and improving animal health and welfare. Suppliers of species specific 'omics tools such as expression arrays and SNP chips will also benefit from access to annotated genomes sequences. (iv) science infrastructure and capacity The Ensembl project is one of three systems worldwide concerned with delivering annotated genome sequences for a large number of species to the scientific community. As such the Ensembl genome portal makes a considerable and continuing contribution to maintaining science infrastructure and capacity. (v) society The Ensembl portal provides direct benefits to specific demographics of society through the provision of out-reach activities and training, as well as to society more widely. How will they benefit? (i) the academic research community The Ensembl genome portal adds value to animal genomes via high quality annotation. High quality annotated reference genome sequences are essential resources for contemporary research in the biological sciences. The Ensembl browser and associated annotation tools and database have been shown to be robust and effective means for making genomic information useful to a wide range of users. (ii) animal breeding companies The proposed Ensembl resources, especially the genetic variation resources, will enable researchers to dissect the genetic control of economically important (and complex) traits in farmed animals including feed efficiency and susceptibility to infectious diseases. This enabling of genetics research in farmed animals and aquaculture species will facilitate advanced genetic improvement. Genetic improvement of farmed animal species is a key means of addressing sustainable food production for the animal agriculture and aquaculture sectors. (iii) owners of farm and companion animals and other stakeholders In companion animals the benefits will be improved tools for selective breeding to minimise inherited diseases and inbreeding and to improve animal welfare. The utility of 'omics technology products for this purpose such as expression microarrays and SNP chips is greatly enhanced when the features on these products can be linked to a high quality annotated genome sequence and other information sources. (iv) science infrastructure and capacity The Ensembl genome portal for farmed and companion animals itself provides a valuable resource underpinning science infrastructure and capacity. In addition, Ensembl has developed a training programme including demonstrations, online tutorials and workshops in the use of the genome portal. This programme trains PhD students, Post Docs and research scientists to develop their skills in genome annotation, genome browsing and importantly how to interpret and understand their own data. (v) society The demographic of society most likely to benefit from the training opportunities the Ensembl project can provide are students who are interested in developing skills in bioinformatics. The project will benefit society more widely by providing a resource that contributes to enhancing sustainable food production.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsAnimal Health
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file