Award details

A bioinformatics tool for the accelerated diagnosis of multiple viral infections in crops using next generation sequencing

ReferenceBB/N023293/1
Principal Investigator / Supervisor Professor Lesley Torrance
Co-Investigators /
Co-Supervisors
Institution University of St Andrews
DepartmentBiology
Funding typeResearch
Value (£) 141,589
StatusCompleted
TypeResearch Grant
Start date 17/10/2016
End date 16/01/2018
Duration15 months

Abstract

Viruses cause significant yield and quality losses in a wide variety of agricultural and horticultural crops, and have an important negative economic impact. Hence, plant virus diagnosis is a field of great significance in terms of the UK's food security and the agricultural economy. Next generation sequencing (NGS) of infected plant material is now a principal focus for viral diagnostics, but it requires fast and robust bioinformatics tools for host sequence and virus identification. Both of these aspects are missing in current software, which in general has been developed for clinical diagnosis. There are no tools for crops that go beyond sequence homology for virus identification that can be used by the diagnostician and give results in a rapid time frame. The aim of this project is to develop a bioinformatics tool that uses mixed RNA sequence reads from infected plant material to produce a viral index in an accelerated timeframe to support disease diagnosis. Such a bioinformatics tool would have direct applications in plant health, quarantine and certification procedures. The tool will include a method for k-mer profiling, which goes beyond sequence homology, for the detection and identification of known and new viruses. The project will also explore the use of a speedy alignment method for plant host extraction, when a reference genome is available. The tool will be developed within Galaxy as a workflow, making it widely available to the non-expert. By using an open workflow platform, the tool will have the potential to be used on a cloud based Galaxy server, which makes it available to researchers without significant computing infrastructure, such as those in developing countries. We will develop and test the bioinformatics pipeline on already collected RNA-seq data from virus infected raspberry plants and potato plant material, but the tool will be applicable to a wide variety of crop plants.

Summary

Viruses that infect crop plants in the UK causes significant losses in terms of yield and quality. In the UK, the production value of the potato harvest is £684 million per year, but the losses due to some viruses are estimated to be £30 million. Hence, the need to quickly and accurately identify virus infected crops is of importance both economically, and to ensure the continued supply of food adequate for a growing population. Current techniques for virus identification only allow the detection of single or at best a very small numbers of related viruses. This makes disease diagnosis slow and expensive. Plant viruses can be identified from their genetic material, which is commonly RNA. It is possible to sequence the genetic material contained within samples extracted from an infected plant. This genetic material comprises a mixed collection of the host plant's RNA and the RNA of multiple viruses (and other organisms) that infect the plant. Technology can be used to sequence this mixed genetic material, which gives a very large data set of millions of short reads of RNA. A major difficulty is the identification of virus sequences within the mixed data set, and the ability to do this in a short enough time period to allow for successful disease diagnosis. Ideally we require a software tool that can take RNA samples and produce a list of viruses present with a matter of days rather than weeks. To date there is no software that can make successful plant virus diagnosis in a sufficiently short timeframe. The aim of the project is to develop software that will take the millions of short reads of RNA from a mixed sample and produce a list of viruses present for accurate diagnosis and so that effective disease treatments can be deployed. The software comprises two elements (a) identification and removal of plant host RNA reads and (b) identification of known and potential new viruses. The identification of RNA viruses in mixed RNA samples is difficult, due to their highsequence variability meaning that even if a related sequence is present in a reference database the differences may be too great to detect the similarity by alignment. In addition alignment methods, in which short RNA reads are aligned against a reference genome and assembled, are too slow for diagnostic purposes. In this project we will develop a bioinformatics tool that will overcome both of these problems. We will use a method known as k-mer counting to identify the viruses present. RNA sequences can be treated as character strings and divided into multiple substrings of length K. In this way a sequence can be represented by k-mer profiles, and these profiles can be compared to identify which species are present in a mixed sample. In addition we will test the use of a speedy aligner that will enable us to identify the host RNA more quickly if a reference genome is available. We will integrate these methods to create a pipeline. The tool will be delivered through Galaxy, an open platform for intensive data analysis, making it widely available to researchers. It will be designed to be used by the non-expert user. The tool will be tested on RNA sequence data from infected raspberry plants and from potato plant material. The tool will have direct applications in plant health, quarantine and certification procedures, used to stop the spread of crop diseases.

Impact Summary

The Purpose of this Project is to develop a bioinformatics tool that uses RNA sequence reads from infected plant material to rapidly produce a viral index for disease diagnosis; with applications in plant health, quarantine and certification procedures. Rapid and accurate virus detection is an essential part of efficient crop management, offering protection against economic losses due to low yield and poor quality. Viral diagnostics is a key component of ensuring security and sustainability of food production both in the UK and in developing economies with low-input farming systems. The key output from the project is the bioinformatics "pipeline". The primary impact of the project will be realised when the pipeline is used for virus identification in new and existing next generation sequence datasets by the beneficiaries worldwide. There are a large number of academic beneficiaries of this project, including researchers involved in disease diagnosis for plant quarantine and certification purposes, crop breeding and those establishing diagnostic tools for cops in developing countries (see previous section). In addition this project will have direct impact on farmers, horticulturists and other food producers who need to monitor the ongoing health of their crops and investigate new instances of disease as they arise. The speed of use of the bioinformatics tool will enable them to deploy appropriate crop management practices in time to maximise productivity and profit. James Hutton Limited, a commercial subsidiary of the institute has established the JHL Molecular Diagnostics Unit, which uses diagnostic tests for plant health assessment and crop genotyping. JHL has an existing customer base with excellent links to growers and agronomists. Therefore, the software pipeline developed by this project will be a major boost to the capabilities of this Unit. Impact of the bioinformatics pipeline will be measured by recording the number of installs made from the GalaxyTool Shed, by recording number of requests for collaboration using the pipeline, by recording the number of references to the pipeline in scientific and other publications, by recording approaches for use of or further development of the pipeline by other research institutions and commercial companies.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsCrop Science, Microbiology, Plant Science, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file