Award details

Development of integrated web interfaces for Bioconductor genomic data analysis annotation and visualization tools

ReferenceBB/E001653/1
Principal Investigator / Supervisor Dr Alvis Brazma
Co-Investigators /
Co-Supervisors
Dr Wolfgang Huber, Dr Misha Kapushesky
Institution EMBL - European Bioinformatics Institute
DepartmentMicroarray Group
Funding typeResearch
Value (£) 90,642
StatusCompleted
TypeResearch Grant
Start date 20/11/2006
End date 19/11/2007
Duration12 months

Abstract

Background Bioconductor is a loosely organised set of open source tools for analysis, visualization and annotation of diverse types of genomic data. Bioconductor modules are implemented as packages in the command-line based statistical environment R. There exist a few formats for data interchange within Bioconductor (exprSet, MAList, etc.) but no single universal format is yet accepted. The Bioconductor toolbox is widely used by sophisticated bioinformaticians in all areas of genomic data analysis, however, no uniform BioConductor APIs exist and no single point of contact standard Web Service interfaces are available. Moreover, wet-lab biologists tend to find the command-line environment difficult to learn and use. Proposed work We propose to implement a set of AJAX (Asynchronous Javascript and XML)-based web-interfaces to core Bioconductor components. These interfaces will be implemented within the Expression Profiler (EP) platform, after necessary modifications/extensions are developed. The users will be able to upload their own data and/or analyse data available in the public microarray repository ArrayExpress. A set of RAD (Rapid Application Development) tools will be developed and distributed openly to the Bioconductor community for quick generation of such interfaces and, as transparently as possible, their integration within the proposed framework. The integrated Bioconductor interfaces will be also available for programmatic access as Web Services, and a system will be developed to keep the APIs up-to-date with latest developments in current Bioconductor releases.

Summary

Genomic data, particularly microarray expression profiling studies, comes in the shape of huge matrices of numbers, anywhere from 10,000 to 6,000,000 rows by hundreds to thousands of columns. These data need to be transformed, standardized, visualized, and annotated. The rows of these matrices report activity (expression levels) of genes under various conditions. The huge data volume, as well as the complexity involved with describing such experimental data, resulted in the creation of a few major public repositories for array-based high throughput genomics data: GEO (NCBI, USA) and ArrayExpress (EBI, Cambridge, UK). Our group at the EBI also has developed Expression Profiler (EP), a web-based platform for exploratory data analysis, which can provide some basic insights into the public data in ArrayExpress. The major thrust of the scientific community's work in creating tools for dealing with such large-scale data has concentrated within the set of open source command-line driven tools collectively called Bioconductor. These tools, or 'packages', are developed by leaders in specialized areas of application: normalization (mathematical methods of making data coming from different laboratories comparable), signalling pathway analysis, clustering analysis, meta-analysis, etc., and are therefore the de facto standard for cutting-edge functional genomics analysis technologies. At the same time, by and large the only users of Bioconductor remain the sophisticated bioinformaticians, while wet-lab biologists (experimentalists who produce the actual data) find the learning curve of the R environment too steep to learn, the R language too complex to master, and the command-line flexibility details too daunting. Moreover, even within Bioconductor, different packages offer different, often incompatible, paradigms of dealing with the data input, output and interchange. There is a definite, clear need to provide easy access to the power of Bioconductor for biologists involved in functional genomics and proteomics experimental research. This project proposes to utilise the EP analytical framework to develop a set of standard, unified look-and-feel web-based interfaces to core Bioconductor modules, which will also make use of the ArrayExpress database. The proposed system will enable biologists to upload securely their experimental data, analyse them with the best available Bioconductor algorithms and to compare or analyse them together with related public high-throughput data in the repository. The data analysis routines will take advantage of the high-power computing infrastructure available at the EBI, and the results will be stored within the system, accessible form anywhere in the world via a web-browser. A further unique advantage is provided by the integration of Bioconductor packages within a set of web interfaces: the interfaces can also be accessed as Web Services, i.e. can be incorporated in automatic data analysis workflows. In other words, even sophisticated bioinformaticians are likely to find this system useful (see attached letters of support).
Committee Closed Committee - Engineering & Biological Systems (EBS)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file