Award details

Wormbase ParaSite

ReferenceBB/P024610/1
Principal Investigator / Supervisor Professor Matthew Berriman
Co-Investigators /
Co-Supervisors
Institution Wellcome Trust Sanger Institute
DepartmentPathogen Variation
Funding typeResearch
Value (£) 235,898
StatusCompleted
TypeResearch Grant
Start date 15/01/2018
End date 14/07/2021
Duration42 months

Abstract

WormBase ParaSite is a database that provides rapid access to new high-throughput genomic and related data from parasitic flatworms and roundworms (helminths). These data include genome sequence, gene expression data, and regulatory data, and are generally produced using massively parallel nucleotide sequencing strategies, and need to be integrated and interpreted to inform parasitology. A major challenge is to provide structural and functional annotation on the genome assemblies, to automatically update this as new experimental evidence becomes available and maintain tracking between successive versions such that researchers can continue their work as the reference data sets improve. ParaSite is mostly implemented through the re-use and (where necessary, the) extension of database technologies developed elsewhere, including the MAKER pipeline (and other tools like RepeatMasker and RFAM) for genome annotation, tools derived from lepbase for representation of genome quality the Ensembl software stack for genome data management and preparation, and the BioMart data warehousing tool that provides high-performance data discovery and retrieval for common use cases centred on genes. Both Ensembl and BioMart provide an interface through the use of the mod-perl programming language embedded in an Apache webserver, while utilising MySQL (a common relational database management system) as the underlying data store. Increasingly, we are supporting the direct incorporation of data stored in binary, indexed file formats (e.g. BAM, CRAM for sequence alignments), simplifying the database build process and improving performance. We are using the emerging Track hub technology to arrange these files to ensure that users can locate and filter data of interest appropriately.

Summary

Parasitic worms (helminths) cause a massive economic burden, with agricultural losses in the UK exceeding £100 million per annum. Across the globe helminths are also responsible for long term, chronic diseases in humans. The UK is a leader in research and development targeting helminths, despite global investment being disproportionately low compared with the impact of infections. Helminth are diverse - the term covers both round worms and flatworms - and no single model species can capture the range of disease-causing mechanisms involved. Researchers are therefore inherently interested in making comparisons between species. The genomes of more than 30 species are now published and many more available. Alongside their genomes, large scale functional genomics datasets have been produced describing key life cycle transitions for more than 10 species. To drive helminth research into the genomic era, we established WormBase ParaSite in 2014. The resource now contains more than 100 draft genomes of helminths. Genes and genomes can be explored, enabling a greater understanding of helminth biology to accelerate the development of new strategies for helminth control. In its first two years, there have been 8 public releases of the resource and in 2015 the website was accessed by 29000 unique users. In addition to accessing gene structures and functional annotation for draft genomes, users are able to examine evolutionary relationships between genes and look for the differences and similarities between species that may underpin differences and similarities in helminth biology. The resource provides fast and intuitive interfaces for browsing and searching and contains an interface for extracting custom datasets. Several workshops have been organised to provide training in its use. This proposal will fund the maintenance and improvement of WormBase ParaSite. We intend to incorporate all publicly available nematode and flatworm genome assemblies as they become available. Due to changes in sequencing technology, research groups will produce new and better versions of existing gene sequences. However, these sequences will in many cases not be annotated, so we will provide an automated way to annotate naked genomes with consistent gene structures and functional descriptions. Defining gene families will remain a critically important activity. However, we will increase the speed, accuracy and scalability in which evolutionary histories can be inferred. We will also greatly improve the way in which data from large-scale studies on gene expression or genome variation are included into the resource. In particular, a new Gene Expression Atlas will be included for interactive exploration of gene expression data. To help identify new drug targets or to identify re-use possibilities for existing drugs, WormBase ParaSite will include links to targets and chemistry data (by linking to the ChEMBL database). We will also enable users to query available phenotypic data. In addition to the new features, we will frequently update the site to provide rapid access to new data. We will continue to provide training on the use of the resource and maintain a live and responsive helpdesk.

Impact Summary

Across the globe parasitic worms (helminths) cause a massive economic burden and are responsible for long term, chronic diseases. Helminths are therefore studied with the aim of killing or controlling them. For pathogens with smaller genomes, particularly viruses, bacteria, and protozoa, access to genome data has transformed the way research is conducted and has led to major insights into spread of infections and drug resistance, and has led to the development of new drugs and vaccine candidates. A similar transformation is starting to take place in helminth research; rapid changes in sequencing technologies have driven down costs and large scale data on genome and gene expression are becoming available. WormBase ParaSite was established in 2014, to enable the helminth research field to accelerate by exploiting the rapid growth in available data. Through assisting helminth researchers, ParaSite will impact governments, NGOs and companies with an interest in disease control. Amongst the downstream beneficiaries from helminth research will be those suffering from infections - more than a billion people worldwide. Human infections, mainly amongst the poorest communities, can result in abdominal pain, haemophilia, stunted growth and mental development, malnutrition, fatigue, disfigurement, blindness, circulatory disorders, or liver and bladder pathologies. Some anthelmintic drugs do exist but with an over-reliance on a small repertoire, the development and spread of drug resistance is an ever-present danger. The global agriculture industry will also benefit from new helminth control measures. In the UK, potato farming is badly affected by potato cyst nematode, and livestock are affected by gastrointestinal nematodes and liver flukes. WB-PS was launched to exploit the rapid increase in available helminth sequence data (genomes and gene expression data). Through the organisation, analysis and dissemination of these data, WormBase ParaSite aims to: (i) provide a clear, annotated representation of the functional regions of genome sequences; (ii) transfer knowledge from well-annotated to less well-annotated genomes and (iii) allow comparisons between helminths so that differences between genomes can be correlated with the evolution of pathogenic traits. Automatic pipelines integrate new data to ensure that users can access an up-to-date interpretation of all available data, and the use of standard data query and retrieval interfaces reduces time that would otherwise be wasted in finding and re-formatting data to make it interoperable. The new application will up-scale WormBase ParaSite - to ensure that the expected flood of new data (more numerous and more contiguous genome assemblies; new expression and variation data) can be processed and made useful to helminth researchers. Another objective is to ensure rapid releases such that this data is quickly disseminated to the community; another is to provide training, in situ at prominent nodes of helminth research, to ensure maximise the familiarity of researchers with the available data and tools. A new portal within ParaSite will be aimed directly at researchers developing drug treatments. We will use sequence similarity to identify homologues to known drug targets from other species (as curated in the ChEMBL resource). We will provide filters to allow users to select genes from parasites whose homologues have properties such as inhibition by a drug that has reached clinical trials but has no known toxicology warnings, or aggregated scores that reflect physico-chemical properties of a compound or drug. To predict new, exploitable target-compound combinations, users will be able to combine their results with relevant gene expression data (e.g. expressed in a mammalian-infective stage), absence of an orthologue in the parasite's host.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file