Award details

EMBOSS: European Molecular Biology Open Software Suite

ReferenceBB/G02264X/1
Principal Investigator / Supervisor Mr Peter Rice
Co-Investigators /
Co-Supervisors
Institution EMBL - European Bioinformatics Institute
DepartmentRice Group
Funding typeResearch
Value (£) 749,881
StatusCompleted
TypeResearch Grant
Start date 01/05/2009
End date 31/12/2011
Duration32 months

Abstract

EMBOSS is an established open source bioinformatics package with over 300 applications and wrappers to third-party applications. EMBOSS is developed in C with an extensive library of over 4000 functions to manage and manipulate DNA and protein sequence data from a wide range of public and local sources. Around 40 sequence formats, and 10 feature formats are supported in the current release. The results of applications are generally single files with sequences or feature annotations. We propose to add a further 100 applications to the EMBOSS suite. These will cover new functional analysis of sequence data, large-scale sequence analysis, the management of EMBOSS results as projects, management of metadata and analysis of next generation sequencing data (public data and generated by the users). We will add many new data structures and functions to the libraries in support of these applications. All source code (applications and libraries) will be fully documented and pass through a strict quality control system that not only tests that applications work but also checks the completeness and accuracy of all documentation. Integration of results, and addition of new data sources, requires us to also load and manage an extended set of metadata including gene information, taxonomy, ontology terms and experimental data. By preserving such metadata, including it in outputs, and providing project summaries, users will be able to subject their EMBOSS results to further analysis through spreadsheets and the R statistics package and to visualise their results in various genome data browsers. We will use a Moodle-based training portal at EBI to create a suite of online courses for end users (under a variety of interfaces), for developers (new applications and minor local modifications), and for systems administrators configuring EMBOSS and associated data resources.

Summary

EMBOSS is an established, innovative software package for sequence analysis. The project was started in 1996 by two bioinformatics developers (Peter Rice and Alan Bleasby) who have developed a set of over 300 programs for the analysis of DNA and proteins. EMBOSS is 'open source' - the source code is available to anyone who can modify or extend it to meet their needs. Users in industry have found that EMBOSS makes their life easier compared with expensive commercial packages. The EMBOSS team has been based for the past 3 years at the European Bioinformatics Institute (EBI). This proposal will provide sustained funding to maintain and support EMBOSS for a further 3 years and to support key new developments for the benefit of the biological sciences community. Up to 100 new applications will be provided. Public data resources for whole genomes and many associated datasets will be integrated under EMBOSS interfaces. Control of the results from large numbers of application runs will be combined and managed more easily. The technology for DNA sequencing is changing at a very rapid pace. Next generation sequencing technologies are capable of generating enough information to re-sequence the human genome in a few days and at a tiny fraction of the original cost. These instruments generate a vast amount of data, beyond the capacity of most existing software and presenting new challenges in data management and processing. Core components of EMBOSS will be rewritten to allow all applications to run on this scale, and new applications will be added to analyse and interpret the results. EMBOSS has been installed by more than 25,000 sites worldwide, many of them in the UK. As well as developing new applications and extending functionality, EMBOSS users have developed a wide variety of interfaces to run the programs through the web or as part of other specialist applications, for example the Taverna workbench - a UK e-Science project that supports long and complex workflow pipelines using EMBOSS applications, run through services provided by EBI. To support such a large number of users, and especially to provide help to those who can benefit more by making their own changes to EMBOSS, we will provide a suite of online eLearning courses with interactive tutorials, videos, printed background materials, self-marking quizzes and a set of follow-up tasks. These will be supported by the European Bioinformatics Institute's new eLearning portal.
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file