BBSRC Portfolio Analyser
Award details
A platform for massive parallel sequencing of longPCR amplicons
Reference
BB/H023534/1
Principal Investigator / Supervisor
Dr David Littlewood
Co-Investigators /
Co-Supervisors
Dr Peter Foster
Institution
The Natural History Museum
Department
Life Sciences
Funding type
Research
Value (£)
119,787
Status
Completed
Type
Research Grant
Start date
13/10/2010
End date
31/07/2012
Duration
22 months
Abstract
High throughput massive parallel sequencing of mixed amplicons, if the identity of original amplicons needs to be known, requires sample-specific markers to be added to amplicon libraries, either through the addition of Multiplex Identifiers or user-designed markers in Parallel Tagged Sequencing. Adding sample-specific markers prior to PCR or emPCR library construction is costly and time-consuming. Alternatively, up to 16 individual samples can be run concurrently on separate sections on a 454 plate, although this halves the total number of reads achievable; a mixture of MIDs and gasketed plates allows up to 192 samples to be run concurrently but numerous additional costly steps are required prior to emPCR. We have noted and shown that amongst many samples of longPCRs, particularly those including relatively rapidly evolving protein-coding genes (e.g. lengths of mitochondrial (mt) DNA), sequences from different species (even sister taxa) are sufficiently different from one another that contig assembly programs can be tailored to untangle and assemble mixed sequences accurately. With sufficient differences between original longPCRs and long read lengths offered by 454 technology, pooled samples can be multiplexed, massively parallel sequenced and reassembled without chemical-tagging of individual reads. Instead, Sanger sequencing ends of each longPCR, offers quality control and unique 500bp identifiers with which to assign identity to reassembled contigs. Using established primer sets and readily available material we proposes to demonstrate that a mixed pooled sample of long PCRs from complete 28S rDNA and mtDNA from a diversity of parasitic helminths can be sequenced to completion with >100x coverage in a single 454 run. Bioinformatic tools building on available assembly software and scripts, will be developed to optimise a pipeline for accurately reassembling the data. Simulations will be run to evaluate the limits of the approach for future applications.
Summary
New generation sequencing techniques offer an unprecedented means of sequencing genes and genomes at a fraction of previous costs and at a phenomenal density of coverage. A variety of platforms offer different techniques. 454 pyrosequencing, also known as massive parallel sequencing, has the advantage of providing relatively long sequence reads (~450 nucleotides) in over 1 million individual reaction chambers on a pico-titre plate; developments are under way to capture even longer reads. When mixing templates from different sources there is a need to link sequences with their source. Two ways are possible and include (i) processing individual samples on single pico-titre plates or individual gasketed sections of a plate (up to 16), or (ii) chemically-tagging templates with unique sample-specific markers. Long lengths of DNA (up to 20,000 nucleotides) are routinely amplified with specialised polymerase chain reactions for a diversity of purposes by a wide variety of users of molecular tools. By sequencing the ends of these long amplicons using traditional methods, and by relying on bioinformatic tools to accurately unscramble the data, we propose a method that allows hundreds of long amplicons to be pooled, fragmented, massively parallel sequenced, accurately reassembled and identified, thus reducing existing costs by orders of magnitude. The technique will allow routine multiplex sequencing of longPCRs where only short fragments could be sequenced previously, or where expensive sample-specific tagging and/or cloning was required. We will test the methodology by generating longPCR amplicons from parasitic helminths, for which: (i) we have a wide diversity of samples available and considerable experience of handling, (ii) there is wide interest and need, including diagnostics, biodiversity studies and evolutionary parasitology. Simulation studies will be used in conjunction with real data to develop, refine and test the bioinformatics pipeline for wider application. The methodology and associated open access computer applications will be transferable to any biological system where diverse longPCR fragments are sequenced regardless of the origin of the DNA.
Impact Summary
Users of longPCR amplicons for screening genetic and genomic diversity are widespread across disciplines as disparate as biomedical science, applied molecular biology, genomics, population genetics, and evolutionary biology. Researchers and fields of biological research and application requiring accurate, cost-effective, high throughput sequencing of longPCRs, without the need for cloning steps or amplicon-barcoding will benefit. Until now, routine sequencing of longPCRs has not been cost-effective and has taken considerable time, thus preventing routine use of these established applications and high fidelity PCR enzymes. This project will bring the cost down by at least 1-2 orders of magnitude whilst increasing speed and depth of sequencing by many orders of magnitude. Publication through open access articles, online postings to end-user groups, listservers and a dedicated project web page will promote the development of these methods and the associated bioinformatics tools. Other beneficiaries include bioinformaticians developing methods of accurate contig assembly from next generation sequencing (NGS) methods, who require dedicated code and methodologies for untangling multiplexed sequences and reassembling larger fragments accurately. The NGS community will benefit from access to open source code, commentaries and results of simulations posted on a variety of websites. Targeted audiences will be addressed through seminars and conferences in association with partners (Applied Genomics Facility, Liverpool), collaborators (Univ. Melbourne) and with direct assistance from the Natural History Museum's Press Office, Research & Consulting Office and Interactive Media teams. We expect that Roche (454 Life Sciences) will take an active interest in the use of their platform (and longPCR kits) for the development of these tools and resources, and we will engage directly with them and other companies with alternative NGS platforms in promoting the results of the project,in line with BBSRC recommendations.
Committee
Research Committee C (Genes, development and STEM approaches to biology)
Research Topics
Technology and Methods Development
Research Priority
X – Research Priority information not available
Research Initiative
Tools and Resources Development Fund (TRDF) [2006-2015]
Funding Scheme
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list
new search