Award details

iPlant UK

ReferenceBB/M018431/1
Principal Investigator / Supervisor Professor David Wild
Co-Investigators /
Co-Supervisors
Professor James Beynon, Dr Robert Davey, Professor Anthony Hall, Professor Tony Pridmore, Dr Timothy Stitt
Institution University of Warwick
DepartmentWarwick Systems Biology Centre
Funding typeResearch
Value (£) 1,776,179
StatusCompleted
TypeResearch Grant
Start date 01/01/2015
End date 31/03/2017
Duration27 months

Abstract

New technologies such as next generation sequencing (NGS), high-throughput phenotyping and metabolite profiling have made large data sets, several terabytes in size, a common feature of modern plant biology. However, intelligent re-use and impact of this data is not always fully realised due to a lack of data storage capacity, compute power for analysis, technical skills (which often have to be self-taught or accessed via a collaborator) and limited tool sharing within the community. The NSF-funded iPlant Collaborative aims to help mitigate these problems. It provides three core services: the Data Store, for cloud-based large data storage and retrieval; the Discovery Environment (DE), for user-friendly data analysis software; and Atmosphere, a platform allowing researchers to custom-build virtual workbenches and share these with collaborators anywhere in the world. Data analysis in the DE is achieved via apps, which are built either by iPlant developers or by users. iPlant is structured as a distributed model within the US, spreading effort, expertise and resources between the Texas Advanced Computing Center (TACC), Cold Spring Harbor Laboratory, and the University of Arizona. It was designed with extension and replication in mind, and we propose taking advantage of iPlant's federation capabilities to develop a UK iPlant node at the The Genome Analysis Centre (TGAC). To encourage uptake and demonstrate the power of iPlant services, three suites of tools in the areas of systems biology, image analysis and sequencing data, which are currently only suitable for use by a small number of experts, will be optimised for HPC and adapted for the iPlant environment, thus widening their applicability and user base. A small number of additional tools from the wider community will also be adapted for use in the iPlant Environment via an extended collaborative support programme.

Summary

Biology is increasingly a 'big data' science as new high-throughput technologies support faster, cheaper generation of sequencing, metabolite and image data. This enables potentially exciting breakthroughs as researchers spot undiscovered patterns and make new discoveries of biological importance. However, many individual biologists, and in some areas the community as a whole, struggle to take full advantage of the data generated because of a lack of computing resource, appropriate support and technical skill. It is not only the output of data analyses, such as a models, curated datasets, or raw data, that have value to the wider community, but also the tools generated during research projects that are used to support researchers to test and validate their hypotheses. Currently these tools often remain in prototype form, for use only within the group or laboratory that generated them, because there is comparatively little standardisation and no easy means of sharing an accessible, user-friendly version of the tool. To undertake world-class bioscience, researchers therefore need to be able to store and access datasets, models and analysis tools, ideally from different locations across the globe due to the need for international collaboration. The iPlant Collaborative was funded by US agency the National Science Foundation (NSF) in 2008 to help solve these issues. The iPlant Data Store is a cloud-based storage space, accessed via iPlant's Discovery Environment (DE), a virtual work/lab bench. In the DE, users can share datasets and tools to analyse data with as many or as few people as they wish. Tools to analyse data developed by iPlant staff or built by others can be shared with the wider community, in a similar manner to 'apps' on smartphones. The iPlant Collaborative is currently distributed across three US locations; we propose to extend this into an international collaboration by building a UK iPlant node at The Genome Analysis Centre (TGAC). TGAC provides the National Capability of computational infrastructure and as such is perfectly situated to provide the foundations for the iPlant UK node. The UK iPlant node would provide independent versions of the iPlant Data Store and DE but would also be linked to the US nodes to share resources and expertise. Physical resource alone is not sufficient for a successful infrastructure: it also needs to be used, maintained and expanded as demand increases. To demonstrate the versatility, power and value of iPlant UK, a dedicated team of programmers based at the Universities of Warwick, Liverpool and Nottingham will adapt tools that have been generated for use in a single project for wider community adoption. Three suites of tools to benefit key areas of UK plant science - sequencing, systems biology and image analysis - will be made available to the global plant research community via the iPlant DE. In less than 10 years, iPlant has built a global user base of over 18,500 users. As this continues to expand, iPlant's future sustainability must be considered. A UK iPlant node will help ensure the future existence and reliability of iPlant, spread expertise and best practice between the UK and US, allow the UK to input to the future direction of this valuable resource and provide an exemplar project to others wishing to establish future international iPlant nodes. By establishing iPlant UK and promoting access to a resource that allows users to readily store and analyse their data, this project will help support a wide range of research including genome-wide association projects exploiting natural variation in crops, predicting biological networks and pathways, and the high-throughput imaging and image analysis services that take researchers one step closer to bridging the genotype to phenotype gap.

Impact Summary

The principal beneficiaries from iPlant UK are research scientists in academia and industry, BBSRC and other funding bodies. The three suites of tools, covering systems biology, sequencing data management and image-based phenomics, will deliver the first applications to iPlant UK and in doing so will provide proof of concept and establish guidelines and best practice for future users who wish to share their own command line-based research tools via iPlant. This proposal will allow increased availability of BBSRC-funded tools for the global community and will help build a common international biological science platform that prevents duplication of effort and funding. In doing so, rational and supported reuse of data, applications and resources is encouraged. As the planned community tool development to prime and troubleshoot the system is focused on plant science applications, the main initial beneficiaries will be the plant science research community, from students to senior researchers. However, many of the tools are generic and can be used with any compatible dataset from any organism. Ultimately, iPlant UK will be a community resource for all biologists: the long-term beneficiaries will be anyone working with big data. Funding bodies will also benefit from iPlant UK. Although sharing raw data has become a standard requirement for publication in recent years, sharing tools developed for data analysis and visualisation is not typical. Where they are shared, whether through an institutional repository or a third-party open data web service such as Figshare or Dryad, their use may be limited by differences in operating systems or the expertise of new users. iPlant UK will provide the tools, guidelines and the platform for developers to share their command line-based workflows with the research community in a user-friendly way. More of the output from publicly funded UK research will therefore be accessible to the wider national and international research community. Although there is limited opportunity for outreach directly via the personnel requested in this project, all services from the iPlant Collaborative, including the Atmosphere cloud computing platform and the DNA Subway undergraduate teaching tool, will be promoted via invited talks and guest blog posts/articles via the PIs from the iPlant UK team.
Committee Not funded via Committee
Research TopicsPlant Science, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Bioinformatics and Biological Resources Fund (BBR) [2007-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file