Award details

Adaptive sampling ('Read Until') methods in optimised nanopore sequencing technologies

ReferenceBB/N018877/1
Principal Investigator / Supervisor Dr Guy Cochrane
Co-Investigators /
Co-Supervisors
Dr Ewan Birney
Institution EMBL - European Bioinformatics Institute
DepartmentSequence Database Group
Funding typeResearch
Value (£) 310,681
StatusCompleted
TypeResearch Grant
Start date 01/03/2017
End date 29/02/2020
Duration36 months

Abstract

We propose to develop algorithms to enable adaptive sampling of DNA in real time by exploiting the unique property of nanopore sequencers, that data are streamed from nanopores and that the Oxford Nanopore Technology minION device allows the specific molecules to be ejected from a nanopore at any time, regardless of how completely it has been read. For this, two linked, but distinct, problems must be solved: The DNA molecule (represented by changes in current flow) must be mapped rapidly to a reference and an accept/reject decision must be made based on accumulated previous mapping events. We will address both of these problems using five model cases of direct relevance to BBSRC science: 1. Rapid even coverage in bacterial genome sequencing (e.g. pathogen identification in food-borne disease) 2. Even coverage in diploid genome resequencing (e.g. marker and variant discovery in livestock welfare and breeding) 3. Sequencing of genomic regions of interest that are recalcitrant to conventional sequencing (e.g. in crop plant genomics) 4. Maximising discovery and quantification of low-abundance transcripts (e.g. in fish pathogen response transcriptomics) 5. Coordination of multi-sample sequencing in complex mixtures (e.g. in comparative metagenomics studies) To achieve rapid matching of early read data to reference sequence we will explore several indexing/pre-computing strategies, including Fast Fourier Transform of streamed data; wavelet transform of the stream followed by indexing; discretisation of the signal and suffix tree or FM-index processing. This tool would run on the laptop local to the sequencer. In contrast, the logical process for accepting or rejecting specific reads will be managed by an external server system running appropriate pipelines on the minoTour minION analysis platform. Templates will be generated for minoTour allowing experienced users to generate pipelines for further specific use cases.

Summary

Over the last three decades, DNA sequencing has become a key technology across and beyond the life sciences. Indeed, few areas of biological research remain untouched by either the direct use of the technology or knowledge that is derived from others' work in which sequencing has been used. The technology has advanced rapidly. In the mid-2000s, a second generation of sequencing technologies, quite unlike the first, brought a step change in the rate at which sequencing machines could operate, and a corresponding vast reduction in the cost. These technologies now dominate and have led to a wealth of new and impactful scientific findings, not least as the core sequencing technology behind many thousands of animal, plant, fungal and bacterial projects. We are now on the cusp of a third-wave of technology, 'nanopore' sequencing, again quite unlike those that proceed it, that promises similar game-changing advances. In the 'Adaptive Sampling' project, we recognise the potential of nanopore sequencing and focus on a particular, as yet under-explored, feature of the technology that promises very significant impact. Nanopore sequencing uses microscopic pores that can be engineered and organised onto a surface. The pores allow DNA molecules to pass through one at a time from one side of the surface to the other. As they transit, the pores provide a direct read-out of the bases (A, C, G and T) that pass the inner surface of the pore. The user places a mixture of DNA molecules (fragments of a whole genome) above the pore, which then captures the end of a DNA molecule and starts to draw it through, reading its sequence as it goes. The control of the system is so refined that, if desired, a DNA molecule can be rejected from a pore before it has been fully sequenced and the capture process can start again rapidly. A key challenge for all sequencing platforms is that some parts of genomes are 'difficult to sequence' and others are not. Because of this, to be certain that a genome sequencing experiment has captured all parts of a genome, the user must set the experiment up to read the genome many times (often 30), so that the difficult regions are read at least once. With Adaptive Sampling, we plan to overcome this obstacle with software that will rapidly read the early sequence from a pore, and make a decision about whether the part of the genome that is emerging from the pore has been read already or is yet to be read. Based on this, a decision can be made as to whether or not to reject the DNA molecule from the pore or to carry on reading to the end. The time saving to be achieved by avoiding re-sequencing in this way will be substantial, driving at far more cost-effective, rapid and 'targeted' sequencing. While our technology will be useful broadly, we will work specifically with five example challenges, in which the tools will be useful. These cover detection and identification of infectious bacteria, the study of agricultural livestock, investigation of crop plant genomes, work on farmed fish to understand responses to disease-causing species and the analysis of communities of microbial species in the environment. There is substantial novelty in this approach. In previous work on Ebola virus, we have shown that rejecting reads using a prototype of our software has potential. What we now propose will be the first example, to the best of our knowledge, of a sequencing approach in which data analysis (previously something that happened after sequencing was completed) has direct impact on the way in which the physical sequencing machine itself is operated during a sequencing experiment. As part of the project, aiming at the broadest possible benefit to the research community, we plan to publish the software and hold two workshops in which we disseminate what we have developed to technologists, genomics laboratories, research scientists and industry.

Impact Summary

The application of sequencing technologies underpins much of biological research today. Our approach, adaptive sampling in nanopore-based sequencing, serves to eliminate coverage bias and focus resolving power and thus has numerous beneficiaries. Within the broad UK and global academic and applied science communities these methods will benefit both those already using, and those yet to use, sequencing methods. The direct impacts of our work will be delivered as an enabling software technology that allows broad use of adaptive sampling. During the project we will specifically demonstrate the technology in five areas of biological research and application, each of which represents a challenge area for current sequencing approaches. These are the rapid sequencing of bacterial pathogens for identification, typing and resistance profiling purposes (demonstrating coverage control in diploid genome sequencing), marker and variant discovery in livestock resequencing (even coverage in diploid genome sequencing), access to regions that are difficult to sequence in higher plants, particularly the crop species (targeted genomic region sequencing), pathogen response transcriptome characterisation and profiling in farmed fish species (low-abundance transcript sequencing) and comparative metagenomics (coverage/focus control in multi-sample sequencing). We expect direct impact on groups of researchers who use sequencing approaches in these areas, including, but not limited to, those who have expressed support for the project (see letters of support). Through the capacity to eliminate coverage bias, sequencing costs will be reduced, making sequencing available to areas of research and application for which cost remains prohibitive (such as deep population biology of crops, the discovery of low frequency variant alleles for livestock breeding programmes and the profiling of expression in non-model species). Through the ability to focus on defined regions, adaptive sampling will bring powerful methods to areas such as ecology and biodiversity (barcoding, whole-ecosystem analysis, occurrences and abundance), environmental sensing (water safety, environmental health, sentinel markers for pollution and climate change), food chain control (food species/breed/line validation, forensic tracking), border and trade control (invasive species, illegal trade in controlled species), bioenergy (investigation of new species, yield improvement), public health (environmental and zoonotic pathogen sinks, epidemiology of anti-microbial drug resistance) and animal health (surveillance, outbreak detection, transmission control). The UK has long been established at the forefront of sequencing technology and the application of adaptive sampling methods to nanopore technologies will serve to continue this trend.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeIndustrial Partnership Award (IPA)
terms and conditions of use (opens in new window)
export PDF file