Award details

A tool for identifying causative mutations from sequencing data without a reference genome

ReferenceBB/M019896/1
Principal Investigator / Supervisor Professor Dan MacLean
Co-Investigators /
Co-Supervisors
Dr GHANASYAM RALLAPALLI
Institution University of East Anglia
DepartmentSainsbury Laboratory
Funding typeResearch
Value (£) 128,027
StatusCompleted
TypeResearch Grant
Start date 01/04/2015
End date 20/01/2017
Duration22 months

Abstract

This proposal builds on exciting preliminary data showing that variant data from a forward mutant screen can be used to order contigs from an organism without a reference genome. We have implemented and tested Genetic Algorithm To Re-order Contigs (GATROC) using a small simulated EMS mutant variant data and contig sequences generated from Arabidopsis. The aim of this proposal is to create a generally useful software tool that will apply an advanced version of our preliminary algorithm that will be able to deal with sequence data from a variety of sequencing technologies as well genomes of different sizes. Additionally we will be testing various extensions of the algorithms that would implement data from backcrossed populations as well variant data from polyploids. We aim to provide visualization tools that would help design markers to verify the candidate mutation. Our algorithm will be provided in various possible implementations such as galaxy pipelines, binaries for use in various operating systems as well open source release of the source code for developers to ensure the software is as widely useful and used as possible The tool is timely and innovative as there is not yet a tool capable of carrying out the analyses we have demonstrated are possible yet there is clear utility in being able to detect causative mutation with out a high quality reference, genetic markers and mapping population. By the end of the project we will have provided a tool that will make causative mutation detection from forward genetic screens easier and quicker in species with a good reference and without. The tools we produce will have direct bearing on many more groups throughout the UK and the world, particularly next-gen genetics projects with draft or pre-draft genomes and speed up research significantly.

Summary

Forward genetic screens are essential to identify target genes behind desirable traits and their beneficial application. Traditional map-based cloning approaches are extremely labour intensive and years can elapse between the mutagenesis and the detection of the polymorphism responsible for the phenotype. The arrival of high throughput sequencing (HTS) technologies has raised the importance of genomics and offers a number of ways to accelerate discoveries using forward genetic screens. A primary application of HTS for genetics is the detection of DNA sequence polymorphisms among different genotypes within a species, as these polymorphisms can be directly associated with phenotypic variation. HTS approaches have accelerated forward genetic screens through the rate at which mutations are mapped. An important advance includes mapping-by-sequencing (MBS), which enables mapping and identification of causal mutation in a single step by providing allele frequency from pools and the identification of causal mutations at single-nucleotide resolution. MBS requires a complete genome assembly and cannot be used in non-sequenced species or those with draft genomes. Hence, there is a need for computational tools to identify mutations directly from a general, whole genome HTS datasets for organisms with a draft or pre-draft genome assembly. Even though the ability to cause mutations and manipulate non-model genomes to test and characterise the candidate mutations are available, lack of or limited genetic and genomic resources are restricting the application of HTS methods to forward genetic screens of non-model organisms. Therefore new methods are necessary that can provide fast and cost-effective ways to order genome assemblies for causative mutation mapping using sequencing data from forward genetics screen on non-model plants and animals. We have exciting preliminary data from an algorithm we have developed that can order contigs based on the expected density distribution of SNPs from forward genetic mutant data. We have devised a genetic algorithm that can effectively traverse the space to find an optimum arrangement that maximises the SNP density distribution according to the expected distribution from the initial genetic screen. We have implemented and tested Genetic Algorithm To Re-order Contigs (GATROC) using a small simulated dataset generated from Arabidopsis. The major objective of the work proposed here is to develop our proof-of-principle fragment arrangement algorithm to be applicable to sequence data from genomes of any size and using data generated from different sequencing technologies. We also would evaluate the algorithms performance using various published studies to provide benchmarks and opportunities to extend to various other systems. We will include a variant call pipeline to deal with a range of sequencing technologies. Additionally we will implement various extensions of the algorithms that would analyse data from backcrossed populations as well as variant data from polyploids such as wheat. We aim to provide visualization tools that would help design markers to verify the candidate mutation. Our algorithm will be provided in various implementations such as Galaxy pipelines, binaries for use in various operating systems as well open source release of the source code for developers to ensure the software is as widely used as possible.

Impact Summary

A tool capable of identifying phenotype inducing polymorphisms without a reference sequence would benefit scientists working in genetics, speeding research significantly and making it possible to work with organisms where currently genomic resources are few. This would inspire scientists in many fields to do new analyses with species that are not currently tractable. Further it would energise the field of forward genetics and polymorphism detection and help stimulate research into an exciting new branch of tool development. Our tool would expedite experimentation by speeding the time from sequence acquisition to causative mutation detection and help stimulate new discoveries in the field of biotechnology and the biological sciences. Non-academic groups who would benefit from what our tool could provide would include biotechnology companies; those involved in breeding plants and animals for agrinomically important traits and indirectly therefore the agricultural community, including farmers. The PI will take the lead on managing the impact plan. The plan will be an agenda item at monthly project meetings. The PI has an excellent track records in communicating the outcomes of his research to a broad audience. Primarily this is through publication in academic journals, but also increasingly in open forums like the internet. We provide regular project updates and code releases via sources like Twitter and github. TSL/JIC has a dedicated communications office for release of information to the general public through websites and the media. Our software will be open-source and released under non-restrictive license to the academic community via laboratory websites, links from published articles and code sharing sites. Discoveries made with the tool, e.g. SNPs or genes linked to disease resistance will be covered by TSL/JIC's Technology Transfer Policy based on maintaining close links with those who are able to make use discoveries for the benefit of society. Discoveries at TSL/JIC are monitored to establish whether they present opportunities to obtain Intellectual Property Protection. This is typically through patenting. The PI will oversee the impact activities and when necessary will seek the assistance of other project members and expert staff at TSL/JIC. Where impact activities include technology transfer or outreach/press release the relevant office at TSL/JIC will be involved. The PI has prior experience at writing scientific and general articles, as well as developing websites. Postdoctoral workers will be encouraged to develop their communication skills within both the academic and non-academic community, with the latter aimed at an understanding of the wider value of their research.
Committee Research Committee B (Plants, microbes, food & sustainability)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file