Award details

Enabling UK wheat research with the CyVerse UK cyberinfrastructure

ReferenceBB/R000662/1
Principal Investigator / Supervisor Dr Robert Davey
Co-Investigators /
Co-Supervisors
Professor Christopher Rawlings
Institution Earlham Institute
DepartmentResearch Faculty
Funding typeResearch
Value (£) 283,383
StatusCompleted
TypeResearch Grant
Start date 15/08/2017
End date 14/08/2018
Duration12 months

Abstract

The emergence of wheat as a reference crop model increases worldwide plant community demand for resource access. Genomes at the level of complexity of wheat require expertise in sample preparation and library construction for sequencing, algorithm design and software engineering for assembly, and biological knowledge for interpretation. Whilst existing computational resources are barely sufficient for small scale analysis, the advent of rapid turnaround times for complete wheat genomes represents a real problem in delivering the requisite datasets and tools to analyse them in a form that is usable to researchers. High-performance computing and modern web-based infrastructure can provide resources to address these challenges, and the CyVerse project is one such "cyberinfrastructure". The presence of CyVerse UK as a dedicated e-Infrastructure platform for life science is a huge boon for UK crop researchers, allowing them to take advantage of a freely available and well-supported set of services for data storage, sharing, and analysis. Coupled with the recent large grant awards to UK institutions for undertaking ever-increasingly complex data-driven investigations into wheat genomics, these institutions will find it increasingly difficult to keep up with computational requirements. CyVerse UK is able to meet these needs, and this project represents an expansion of existing hardware in order to proactively prepare for the deluge of wheat data that will need to be managed. We will procure and deploy 40 modern, fit-for-purpose compute nodes that can be introduced into our existing CyVerse UK infrastructure, housed in two data centres at the Earlham Institute. Each node comprises 2 12-core Intel Xeon CPUs, 512GB RAM and a local 1TB solid state disk for fast file input/output operations. These nodes will be used for day-to-day wheat analysis pipelines provided by CyVerse UK, as well as supporting the implementation of the CyVerse Atmosphere cloud computing platform.

Summary

Bread wheat represents one of the most complex examples of a plant genome, as well as one of the most commercially important in the UK and internationally with over 750 million tonnes harvested annually, 14 million in the UK alone. This juxtaposition creates a range of challenges for biologists and data analysts - how can the balance between needing large amounts of data to answer complex biological questions about wheat genetics and the requirements for analysing this data and be found? Furthermore, the pressing issues of climate change that we face are all too evident. We need to use modern technology to increase productivity and output for our wheat researchers, drive breeding strategies, and benefit the public's nutritional needs. CyVerse represents such a technology, whereby computational resources, data storage, and analytical tools are made available through web-based graphical interfaces for end users or command line interfaces for power users or system administrators. CyVerse UK is the first implementation of the multi-million dollar CyVerse project outside the US, and both systems are interoperable, i.e. able to share their compute and storage services without the user needing to know where their analyses will be taking place. This federation allows a reduction in shared management cost, and an increase in productivity through shared expertise and software development. The use of "the cloud" is commonplace in today's internet era. Users are moving away from storing data on their own devices, but using services hosted by third party providers such as Google, Microsoft, and Amazon. Furthermore, these vendors also supply complete computing environments over the internet, e.g. Amazon Web Services, and Microsoft Azure. However, these resources are not designed for the kinds of scale that are required for wheat researchers to make the most of publicly available and personal datasets, and the costs of running such environments are unclear at best and prohibitive at worst. Therefore, through the deployment of the proposed CyVerse Atmosphere cloud computing platform in the UK, we would be able to supply virtual server resources to users "elastically", i.e. elastic computing resources can be scaled up and down easily by users themselves. In this way, we can provide flexible computing power when and wherever required, to wheat researchers, labs, and breeders. These virtual wheat data analysis labs can be shared with a wider research group, even internationally, promoting collaboration and knowledge transfer.

Impact Summary

UK research supports the underpinning breeding and baking sectors, as well as the £6 billion farming industry, critical to the UK rural economy. Wheat is the most important UK crop, with annual production of over 14 million tonnes, and market values for seed and processed products of around £1.4 billion and £14 billion, respectively. More frequent extremes in climate, increased precipitation, flooding and drought, will further affect wheat yields. There is an urgent need to address the problems of producing sufficient nutritious food for 2050, along with the significant associated societal and economic benefits. This project will establish guidelines and best practice for wheat researchers who wish to share their datasets with the wider community, their own research tools via the CyVerse UK infrastructure, and initiate user-provisioned cloud computing environments that can form powerful and bespoke "virtual labs" of shared resources. This proposal will allow increased availability of BBSRC-funded tools for the UK wheat community and will integrate with the CyVerse project in the United States to form a common international biological science platform that prevents duplication of effort and funding. In doing so, rational and supported reuse of data, applications and resources is encouraged through this proposal. The impact delivered from this expansion of CyVerse UK to support wheat research will be seen by research scientists in academia and industry, funded by BBSRC and other bodies, that are involved in the application of bioinformatics analyses to wheat datasets. It will also impact breeders and policymakers, through the release of openly available datasets and analytical tools that power fundamental and applied research in wheat improvement. The main beneficiaries will therefore be the UK wheat research community, from students to senior researchers. However, many of the tools that are already in use can be run with any compatible dataset arising from existing or future wheat research. Ultimately, CyVerse UK will be a community resource for all wheat biologists: the long-term beneficiaries will be anyone working with big data in the wheat domain. Funding bodies will see huge benefits from extending CyVerse UK, mostly through cost-effective provision of shared computing resources that are locally and remotely accessible to a number of UK research institutions. Although sharing raw data has become a standard requirement for publication in recent years, the wheat community needs guidance and support to carry out this daunting task. Similarly, sharing tools developed for data analysis and visualisation is not typical. Where they are shared, whether through an institutional repository or a third-party open data web service such as Figshare or Dryad, their use may be limited by differences in operating systems or the expertise of new users. CyVerse UK will provide the tools, guidelines and the platform for developers to share their command line-based workflows with the wheat community in a user-friendly way. More of the output from publicly funded UK wheat research will therefore be accessible to the wider national and international research community.
Committee Not funded via Committee
Research TopicsCrop Science, Plant Science
Research PriorityX – Research Priority information not available
Research Initiative Advanced Life Sciences Research Technology Initiative (ALERT) [2013-2014]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file