Award details

Building a Next Generation Image Repository: Molecular Annotation and Cloud-based Data Processing and Analysis

ReferenceBB/M018423/1
Principal Investigator / Supervisor Professor Jason Swedlow
Co-Investigators /
Co-Supervisors
Dr Alvis Brazma, Professor Rafael Edgardo Carazo Salas
Institution University of Dundee
DepartmentSchool of Life Sciences
Funding typeResearch
Value (£) 1,788,152
StatusCompleted
TypeResearch Grant
Start date 01/01/2015
End date 30/06/2016
Duration18 months

Abstract

We will construct the Image Data Repository (IDR) based on hardware infrastructure located at EMBL-EBI and integrated with its existing resources for hosting and delivering large datasets to the world's scientific community. These resources will serve as the storage and archive for IDR data. OME's Bio-Formats and OMERO will be used to read, manage, serve, and link the data to EMBL-EBI's molecular and structural resources. We will build custom user interfaces and workflows for the IDR, to ensure easy access and browsing to the datasets it holds. To enable computational re-analysis of the data, we will extend OMERO's distributed compute capacity and make use of EMBL-EBI's Embassy system to allow virtual access to IDR data. This virtual resource will provide a 'sandbox' for performing processing and reanalysis of data deposited in the IDR and provide a working example of a next generation data repository that stores and manages data, but also provides community services for scientific data.

Summary

Access to primary research data is vital for the advancement of the scientific enterprise. It facilitates the validation of existing observations and provides the raw materials to build on those observations. In the life sciences, there are numerous examples where members of a research community determined that a particular type of data would be useful and necessary to share. These include gene sequences, protein structural data, and gene and protein expression profiles. In these cases the community united to standardize the structure of the data and its associated metadata, and to create centralized repositories to facilitate deposition, promote discoverability, and ensure the longevity of the data. Imaging in the life sciences has undergone a revolution in recent years and is now used as a quantitative assay technology throughout the life and biomedical sciences. Imaging is used to understand the behavior of organisms, the formation of embryos, the structure and dynamics of cells, and the function and interactions of molecules that are the building blocks of life. Imaging datasets are complex, heterogeneous, and often extremely large, so they are rarely shared or published. Based on the recent development of several image data management technologies and the rapidly decreasing cost of large data storage facilities, we propose to create a resource to host, serve, and make available original scientific image data that underpins life sciences research. Our proposal is based on open source technologies with proven utility and performance that already run on-line resources serving several terabytes (TBs) of image data. We propose to place this resource at EMBL-EBI, which is the established home of molecular and structural life sciences data and interface the resource with ELIXIR, Europe's research infrastructure for life science informatics. In particular we will build links with established molecular and structural resources and work towards a seamlessintegration of these data, so that any scientist can easily browse, query and compute on genomic, structural and phenotypic data across several scales.

Impact Summary

The resource has the potential to impact all branches of basic life sciences research. If the IDR is built and delivered there will be literally massive impact for the community. Datasets that have never previously been accessible will be available for the community to search, view, mine and even process and analyze. Rich visualization and annotation will make both interactive browsing and programmatic mining possible for the first time. This project will deliver a resource valuable for scientists, funders, and journals, by promoting the validation of experimental methods and scientific conclusions, the comparison with new data obtained by scientists in the world, the possibility of data re-use by developers of new analysis and processing tools. In particular, the IDR will provide an opportunity to test concepts and measure the true value of reproducibility in science. Finally the IDR can serve as a model for how large complex multidimensional datasets can be shared with worldwide scientific communities.
Committee Research Committee A (Animal disease, health and welfare)
Research TopicsX – not assigned to a current Research Topic
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file