Award details

'Omics Data Sharing: the Investigation / Study / Assay (ISA) Infrastructure

ReferenceBB/I000771/1
Principal Investigator / Supervisor Professor Susanna Sansone
Co-Investigators /
Co-Supervisors
Dr Philippe Rocca-Serra, Dr David Shotton, Professor Anne Trefethen
Institution University of Oxford
DepartmentOxford e-Research Centre
Funding typeResearch
Value (£) 819,749
StatusCompleted
TypeResearch Grant
Start date 01/09/2010
End date 28/02/2014
Duration42 months

Abstract

Despite the many obvious benefits of data sharing, unification of our global, invaluable, and now vast, biological data stores has proven elusive. Associated concerns over the inaccessibility of data, leading both to lost opportunities for discovery and unnecessary duplication of effort, is driving a focus on 'omics data sharing. In 2009, major international groups of researchers held workshops to promote improved data sharing of pre- and post-publication resources. Funders are also concerned as evidenced by the publication of data policies aimed at improving stewardship of billions of pounds of hard won research data, especially in the field of 'omics research. Obstacles include the long-standing issue of a lack of software solutions for supporting data sharing that suits the needs of data submitters and users alike. To overcome these challenges we have designed and developed the ISA infrastructure, the first pilot-stage freely available software suite for curating, aggregating and sharing multi-omics investigations. In this project we will complete the software suite and help our wide range of collaborators to deploy several ISA Networks environments to: (i) assist in the reporting and local management of experimental metadata, (ii) empower their user communities to uptake community-defined MIBBI-registered checklists, OBO ontologies and the ISA-Tab format and (iii) facilitate submission of metadata to international public repositories. We will also continue our consensus-building standards activities, mapping/matching concepts in MIBBI to those in OBO Foundry ontologies and make sure all are can be captured in ISA-Tab format and manipulated/displayed in the ISA Infrastructure. Lastly, under the large BioSharing Consortium umbrella, we will formalize linkages between wide range of communities; including MIBBI, OBO Foundry and the ISA Networks, as well as journals, funders, industry, databases, biocurators, and next-generation technology providers.

Summary

There is a pressing and recognized need in the biological domain for improved data sharing and unified access to data from a wide range of sources. The use of 'omics technologies (such as genomics, metagenomics, transcriptomics, proteomics and metabolomics) is now wide-spread and the rate at which these technologies generate data is revolutionizing the scientific landscape. This massive influx of data brings both unprecedented scientific opportunities and a range of challenges that must be met if these data, and the public investment in science that they represent, are to be fully exploited. While there are many obstacles to overcome if we are to realize large-scale multi-omic data sharing at the community level, solutions are now possible due to the activities of a range of grass-roots standardisation projects including the 'Minimum Information for Biological and Biomedical Investigations' (MIBBI) project (http://mibbi.org/) and the Open Biological Ontologies (OBO) Foundry (http://obofoundry.org/). We propose to make more widely available our 'omics data sharing software based on the 'Investigation / Study / Assay' (ISA) concept (http://isatab.sf.net). The ISA concept allows the description of any 'Investigation' comprising one or more 'Studies' in which biological samples have been studied using one or more 'Assays' (technologies). The ISA concept is supported by the MIBBI community and has been used to structure a universal file format, ISA-Tab. The ISA-Tab file format leverages biologists' familiarity with, and trust of spreadsheet-based input and manipulation of information. Descriptive experimental information (metadata) captured in ISA-Tab format is made compliant with MIBBI-registered standards (for transcriptomics, MIAME; for proteomics, MIAPE; and for genomics, MIGS/MIMS) using pre-defined extensions. ISA-Tab can be configured to hold additional fields allowing users to comply with emerging standards as well. The availability of this universal file format has enabled the creation of a set of tools and a database to hold data sets captured in it. The current pilot-stage ISA Infrastructure provides a complete solution for managing multi-omic metadata at the community level. A core aspect of the design of the ISA Infrastructure is its integral use of OBO Foundry ontologies to describe investigations, rendering data descriptions unambiguous and computationally accessible. In the course of this proposed project, we will extend the current ISA Infrastructure implementation and work with identified research communities and their bioinformatic service providers to set up 'ISA Networks' in the UK and around the globe, covering a wide range of data types. These portals will serve as 'one-stop shops' for the aggregation and display of relevant datasets at the community level. The metadata captured will support searching and data discovery across organisms, technologies and data types. The shared use of minimum information standards, ontologies and a single file format will support exchange of data between communities and the transfer of data to and from public repositories. At the international level, we will work closely with the MIBBI and OBO Foundry communities to further unify MIBBI checklists and OBO Foundry ontologies to support descriptions of multi-omic investigations. The development of the ISA Infrastructure must be consensus-driven and is therefore best developed under the auspices of an international working group. We will therefore formalise the collaboration between ISA Networks and work within the data standardisation community to increase linkages between currently separated groups by launching the BioSharing Consortium (http://biosharing.org).

Impact Summary

See lead organisation form.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file