Award details

Genome Annotation for the Masses

ReferenceBB/K004204/1
Principal Investigator / Supervisor Professor Yannick Wurm
Co-Investigators /
Co-Supervisors
Institution Queen Mary University of London
DepartmentSch of Biological and Chemical Sciences
Funding typeResearch
Value (£) 118,989
StatusCompleted
TypeResearch Grant
Start date 30/11/2012
End date 29/08/2014
Duration21 months

Abstract

Genomes of emerging model organisms are now be sequenced at almost no cost. The major bottleneck has become obtaining accurate gene models because automated gene prediction programs incorrectly predict start sites, intron-exon boundaries and may even miss or merge whole genes even if large amounts or RNA sequence are available. Fixing and refining gene models is thus required before rigorous analyses can be performed. However, refining a single gene model can take up to several hours and thus remains difficult to justify beyond exceptional cases. Tasks from other research areas that require human brainpower but are similarly repetitive have been successfully crowd-sourced to members of the general public. GalaxyZoo volunteers have categorized millions of photos of galaxies and thus triggered the characterization of multiple previously unknown galaxy types and other stellar objects. Similarly, players of the FoldIt game earn points by minimizing the free energy of putative protein structures and in some cases perform better than specialized structure prediction algorithms or even expert protein modelers. Contributors to such projects may be motivated by the intellectual challenge, the desire to learn new skills, to contribute to the greater good, to compete or earn recognition among peers, or in some cases even to earn small amounts of financial compensation. The project proposed here takes inspiration from such crowd-sourcing initiatives. We aim to create an online game to crowd-sources gene model refinement. In doing this our game will provide a key service to biologists by rapidly generating high-quality gene annotations at little or no cost.

Summary

The hereditary information carried by each living thing is its genome. Stored in the form of the DNA sequences of As, Cs, Gs, and Ts, between 1 and 5% of the genome sequence consists in genes. These genes contain instruction sets for small protein machines that accomplish specific tasks and ultimately determine the organism's shape, size, behavior, lifespan and disease susceptibility. Determining the genome sequence of an organism is now straightforward. But understanding which genes are responsible for the unique characteristics of the organism remains challenging. This is due in particular to the difficulty of correctly finding the genes in the genome and determining which parts of their sequence encode proteins. Indeed, automatic gene identification software performs poorly, thus evidence for each potential gene model needs to be visually inspected and corrected. Thus preparing the data for even a small research project can take months. Luckily there is a solution. Thousands of members of the general public have used the internet to contribute their time to help scientific projects such as GalaxyZoo and FoldIt, be it out of curiosity, desire to help the greater good, gain peer recognition or simply to have fun. Results of their contributions include the identification of previously unknown galaxy types and determination of the 3D structures of AIDS proteins. The proposed project uses a similar approach to encourage members of the general public to help identify genes in the genome and refine their borders. We are constructing a game in which contributors use pattern recognition skills to improve gene models. Contributors will be able to choose to focus their efforts on particular species (e.g.: ants, humans, elephants) or research topics (e.g.: cancer, immunity, longevity, taste or odor perception, behavior). They will earn points and thus peer recognition for their contribtutions, and may be acknowledged in scientific publications or even financially compensated. This project will thus allow members of the general public to have fun while helping to make the world a better place and facilitate scientific discovery.

Impact Summary

Members of the general public who use our software will learn new biological knowledge and skills. This capacity building will occur thanks to use of educational material we put on the website and to the thought processes required for refining gene models. Additionally, the visibility this project obtains among contributors and the general public will increase public engagement with biological research. Visibility will be obtained: * initially through a small online advertising campaign and use of our tool in coursework, and subsequently by strongly encouraging users to advertise their participation to peers on social networks such as Facebook, * through our international, interdisciplinary team of collaborators, * thanks to the public relations office at Queen Mary University of London, the Swiss Institute of Bioinformatics (SwissProt) and other collaborating institutes. Our project will also: * contribute toward changing organization culture and practices by showing that crowdsourcing practices work, * accelerate discoveries in fundamental bioscience including those relating to food security and improving human quality of life and health, * improve the effectiveness of researchers thus indirectly improving society.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file