Award details

COpenPlantOmics (COPO): a Collaborative Bioinformatics Plant Science Platform

Reference	BB/L021390/1
Principal Investigator / Supervisor	Professor Katherine Denby
Co-Investigators / Co-Supervisors	Dr Ruth Bastow, Professor James Beynon
Institution	University of Warwick
Department	School of Life Sciences
Funding type	Research
Value (£)	72,110
Status	Completed
Type	Research Grant
Start date	31/10/2014
End date	12/06/2016
Duration	19 months

Abstract

Accessibility to biological data has been hindered by lack of standards, lack of awareness of the benefits and pathways to releasing data that is described by those standards, and lack of services whereby data can be analysed, published and retrieved easily. Recently, there has been a large commitment by the BBSRC to push for open access data and publishing to further bioscience research in the UK. However, barriers still exist that prevent scientists from openly depositing their data and metadata, which comprise a lack of interoperability between metadata annotation services, data repositories, data analysis platforms and data publishing platforms. As such, plant scientists might not: be aware that the services exist; have the expertise to use them; see the value in properly describing their data. This project aims to build COPO, the software infrastructure required to reach the level of interoperability that plant researchers need to describe their data using community-recognised ontologies, seamless bi-directional data flow to relevant repositories, and then publish these data for open access. COPO will manage the hardware infrastructure at TGAC to deliver a consistent robust staging area and database that will support unique accessioned artefacts representing the corpus of data and metadata a user wants to expose. The resulting marked-up datasets processed and published using COPO will allow greater potential integrative analysis using existing tools such as iPlant and Galaxy. New Application Programming Interfaces (APIs) will interconnect existing tools and services, and by developing new RESTful user interfaces that wrap up these APIs, COPO will be a single point-of-entry for plant researchers to disseminate their data all the way from generation to publication. By federating the TGAC iRODS data grid system with others, e.g. Texas Advanced Computing Center's iPlant installation, access to worldwide analytical infrastructure and data will be facilitated.

Summary

We live in a digital age where we increasingly rely on interconnected resources in our daily lives. Biological science, due to the very nature of the complexity of worldwide research avenues, is typically fragmented. Even though scientific information is published in peer-reviewed articles, it is often badly described and, until very recently, often unavailable to the general public because of journal licensing issues and expensive subscription costs. The field of bioinformatics (the analysis and management of biological data using computational methods) produces many freely available tools for data analysis and exposure that are incredibly useful to researchers. However, these tools often do not interoperate well, meaning that great effort is spent attempting to convert or tweak datasets to fit with other tools that further bioinformatics processes, hindering timely accurate reusable research. Couple this with the lack of descriptive information noted earlier, and knowledge that can be vital to one researcher, team or community can become at least unreproducible (thus letting others confirm findings) at worst unusable. Life scientists are people focused on investigating biological processes. This requires a lot of time, effort and fastidiousness in experimental observation, data collection and analysis. Typically for life scientists, more time is spent on the former: defining and publishing experimental methods and results. The latter, i.e. the data behind these results, is usually badly defined and largely unpublished. For computer scientists, the story is reversed - the focus is on getting to the data. This platform will bridge the gap between these two groups by providing tools and training to both life and computer scientists in the plant bioscience field, in order to help them get their data into the right formats and described uniformly for open research. To do this, the management, interoperability and curation of scientific datasets is key. Researchersneed clear guidance and help to: - Manage their data in a concise relevant way that allows immediate reuse by others: Generating data is only one part of the picture. To back up scientific findings, data needs to be made available to others to allow the same degree of rigour and peer review that is enforced for published material. This is not an easy task because the tools and resources required to describe data well and to make data available are typically designed for the computer scientist. - Let them analyse their data easily: Large software development projects like Galaxy provide access to complex analytical tools - we are not aiming to reinvent the wheel in this regard. We aim to engage and collaborate with these existing providers to develop and exploit interfaces to these specialised software projects, so to let descriptive tools and analytical tools communicate efficiently. This project will address these issues directly, providing tools for storing, annotating and sharing valuable information as well as promoting clear guidance, training. Overall this promises to be a major boost to UK plant sciences research. This project aims to promote and build links between scientific knowledge and the tools used to generate that knowledge, addressing the lack of descriptive information about underlying data. By doing so, we will provide a platform comprising both existing tools and novel interoperability processes, allowing researchers easy access to methods of describing their work, feeding directly into analytical software, thus promoting clear and robust best practices in science. Open science is vital to the future generation of researcher, especially to realise the goals of transparent knowledge sharing. This project will remove the barriers that restrict researchers in making their findings freely available to everyone in a consolidated seamless easy-to-use fashion.

Impact Summary

Academic, Economic and Commercial Impacts With the renewed interest and push from all areas of bioscience to promote publicly available research, the COPO project will be a pioneering national and international effort to facilitate sharing of all aspects of plant research to the public. COPO aims to be the vehicle to bring together the tools required to harmonise open plant omics research. This sector has obvious ties with industry. Public domain omics-based bioscience is relevant and important input into industry internal research and discovery activities. To make such bioscience data truly reusable and ensure scientific robustness, it must be uniformly annotated, allowing not only integration through equivalence of terminology but also by increasing efficiency in data production and re-use, and allowing correct interpretation by means of the context provided by their metadata. A collaborative platform for frictionless bioinformatics built with and for the academic and industrial community is long overdue. Alongside data processing, industry also works on finding solutions for integration and management of large 'omics data sets, e.g. efforts like the Pistoia Alliance. Together with COPO industry partners (Eagle Genomics) we will develop use-cases for the platform in industry, propose acceptance criteria required for commercial use, supply technical advice/support on meeting acceptance criteria, evaluate the platform on 3rd party infrastructure, and maximise knowledge exchange and commercialisation. COPO and the standards community Expertise and knowledge gained throughout the lifetime of the project and beyond will be disseminated through a variety of channels. The presence of a direct link with the plant science community (through GARNet, UK Plant Sciences Federation (UKPSF)) is key to the success and adoption of the platform and associated standards. The project will have a continuous dialogue, through face-to-face events as well as online tools and social media, between those working on the platform and the plant bioscience community. The several letters of support show a clear interest in working together, using and adopting a platform that implicitly confers standards compliance. COPO will provide a solution to overcome the challenges in standards fragmentation by (i) fostering development, acceptance and implementation of reporting standards that are immediately suitable for plant research, and (ii) limiting the range and variability of standards. This will have a direct impact on the development and maintenance costs for commercial and academic software developers of standards-compliant products. Societal impacts Historically there has been reluctancy to adopt some of the standards and open-data principles in the plant bioscience community, especially in the field of food sustainability and security, so openness and transparency in these areas are vital to continue improving the public perception. The presentation of the research data will play a key role in opening the dialogue with the general public and will contribute to the development of stronger links with sectors in society (such as school teachers) that are less familiar with the scientific activities in plant research and the beneficial impact this has in their lives. It is widely recognised that the shortage of expertise and skill in biomathematics and informatics across the world is a major risks for a future development of key areas in life sciences. The objectives of this proposal will help to attract talented staff to work with the COPO partners, and offer alternative career paths.

Committee	Not funded via Committee
Research Topics	Plant Science, Technology and Methods Development
Research Priority	X – Research Priority information not available
Research Initiative	Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding Scheme	X – not Funded via a specific Funding Scheme