Award details

Development of computational strategies for identification and characterisation of viruses in metagenomic samples

ReferenceBB/M004805/1
Principal Investigator / Supervisor Dr Richard Leggett
Co-Investigators /
Co-Supervisors
Institution Earlham Institute
DepartmentResearch Faculty
Funding typeResearch
Value (£) 307,411
StatusCompleted
TypeResearch Grant
Start date 01/12/2014
End date 30/11/2017
Duration36 months

Abstract

The analysis of data from next generation sequencing of metagenomic samples has emerged as an important tool in recent years. In the past, much of this analysis has involved targeted 16S ribosomal sequencing followed by taxonomic classification. However, the increase in throughput and reduction in cost of NGS, combined with the lack of resolution provided by 16S approaches, has encouraged the adoption of whole genome shotgun approaches. While read mapping is still a useful tool for analysing this data, greater insights are possible from assembly of reads. However, metagenomic assembly is a very immature field with only a handful of assemblers having emerged. One of these is our own MetaCortex, a proof-of-concept assembly tool that has shown promising results when applied to the analysis of the virome of a species of bats from West Africa. The purpose of this project is to develop the algorithms necessary to turn the proof-of-concept into an efficient and sensitive assembly tool that will benefit the metagenomics community. Though we feel the tool should have applicability to a wide range of metagenomic datasets, we are targeting the particular problem of viral detection, as this is an important and under-explored area of metagenomic analysis that has important implications for animal and human health. In order to validate the effectiveness of the assembly algorithms, we plan to test on simulated datasets and, crucially, on new metagenomic sequence data generated for this project. This will include samples from humans, cows and insects that carry known viruses, or have been artificially infected with known viruses. Additionally, we have access to a set of rodent samples collected in Africa that are expected to contain many zoonotic viruses. These will be used as a case study to demonstrate the effectiveness of the tool in real world experiments.

Summary

Metagenomics is the study of the DNA of mixed environmental samples that include the genomes of many different organisms. We can sequence metagenomic samples using the same next generation sequencing technology that we use to sequence the genome of a single organism, but analysing the data is much more complicated because it is difficult to know in advance which organisms are present in a sample and therefore difficult to know which organism a particular fragment of DNA (a 'read') has come from. Assembly is the process of putting together short reads into contigs that represent a much longer fragment of DNA, enabling more useful analysis. Assembly is a difficult but relatively mature field when it involves DNA from a single organism. However, many of the simplifying assumptions made by assembly tools are invalid when dealing with metagenomic data, making the process of metagenomic assembly much harder and the field much less mature. The aim of this project is to develop computational algorithms for metagenomic assembly and to produce a tool that is sensitive and able to accurately differentiate between very similar species. We have targeted a particular type of metagenomic data involving viral detection because this is an important area and one that is particularly under-addressed with the small number of metagenomic assembly tools that already exist. Using such a tool enables scientists to gain vital information from metagenomic samples, including understanding the mechanisms of disease in animals and humans, detecting novel viruses and monitoring the spread of viruses in order to prevent and contain outbreaks.

Impact Summary

Academic impact Techniques for assembly of metagenomic sequence data are in their infancy. As presented in the BBSRC's Review of Next Generation Sequencing, provision of assembly software for metagenomics is "highly deficient" (Conclusion 10). An important academic impact of this work will be to drive forward methods for metagenomic assembly by increasing understanding of the problems, by developing new algorithmic approaches and by encouraging best practice techniques for analysis. The BBSRC's expert working group on metagenomics identified that the UK had failed to take full advantage of metagenomic techniques, something that is reflected in the current research highlight. This project will contribute to addressing this shortfall by helping to support the establishment of a research group focused on metagenomic tools and by increasing the knowledge and expertise of UK researchers, both through training of the post doctoral researcher and research assistant, and through wider training in the use of the tool that will be developed. The specific focus of the project on viral detection applications will have impact on those working in diagnostics and surveillance, providing them with tools, techniques and knowledge to enable them to more efficiently and effectively carry out their work. This might include epidemiologists tracking viral borne disease in the UK and overseas, as well as those seeking to understand the often complex interlinked nature of animal and human disease mechanisms. The development of the tools will generate new opportunities for collaborative work with R&D groups in industry and with academic institutions, particularly those also funded by BBSRC. TGAC already collaborates very closely on virus work with the Pirbright Institute, the University of Cambridge Veterinary School and with the Centre for Viral Research, Glasgow University. The two staff employed for the project will gain important knowledge of bioinformatics, metagenomics and virology. They will develop extremely valuable skills in the use of high performance computing environments and will gain further opportunities to develop their written and verbal communication skills. Economic and societal impacts Metagenomics is a powerful tool for the study of health and disease in animals of agricultural importance and in carriers of zoonotic infections, enabling us to understand the role viruses play in disease outbreaks and enabling interventions to be applied before outbreaks occur. An indirect impact of the project will be to inform policy makers about circulation of pathogens and to enable them to better plan for outbreaks, both in the UK and abroad. Within a human clinical setting, metagenomics also has the potential to be a powerful diagnostic and monitoring tool. The knowledge that will come from metagenomic analyses of viral datasets could lead to economic benefits, as there is a need for cheap diagnostic tests to be developed for animals (e.g. livestock) and humans, presenting opportunities for current and start-up biotech companies. Developing metagenomics and bioinformatics skills in the UK is vitally important and this project will contribute towards that. We believe it will attract talented people and encourage them to consider a career in UK genomics. This project will also contribute towards the UK and BBSRC being recognised as leaders in metagenomics, bioinformatics and viral genomics.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsMicrobiology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file