Award details

Computational Developments

ReferenceBBS/E/T/000PR9817
Principal Investigator / Supervisor Dr Robert Davey
Co-Investigators /
Co-Supervisors
Mr Bernardo J. Clavijo, Dr Tamas Korcsmaros, Dr Richard Leggett, Dr Iain Macauley, Prof. Christopher Quince, Dr David Swarbreck
Institution Earlham Institute
DepartmentEarlham Institute Department
Funding typeResearch
Value (£) 4,532,581
StatusCurrent
TypeInstitute Project
Start date 01/04/2017
End date 31/03/2023
Duration59 months

Abstract

State-of-the-art technologies are generating unprecedented amounts of complex data, from genomes, to proteomes and transcriptomes, thus spanning mechanistic and functional diversity. Handling, interpreting and integrating these large scale data into descriptive models that interpret the molecular functions at a system level requires continued development of algorithms, robust computational models, and interoperable analytical frameworks. Supported by our core capability, In this work package, we will contribute to the newest developments in the data sciences and facilitate the extrapolation of meaningful signals from often noisy data. We will continue to develop efficient, reproducible and robust assembly and scaffolding algorithms, and robust statistical models to handle diverse complex genomic and metagenomics datasets. We will improve and further develop software to facilitate orthology assignment across complex and highly divergent species. We will also incorporate machine learning approaches to integrate the conservation of functional signals across species to infer functionality and improve annotation. We are also developing new statistical and network analytical approaches will be applied to track temporal and spatial changes across and within species to further inform phenotypic complexity. Our algorithm optimisation expertise will enable us to drive computational advances in accuracy and efficiency across our research into assembly and variant calling, annotation and, network analysis. These efforts will put the platforms in place to consistently collect and rapidly feed datasets into downstream integrative analyses, enabling the extensive and complex data interrogation processes required for bringing together multiple heterogeneous datasets. We will also apply these strategies to investigate and implement low power consumption computing technologies for data acquisition and analysis that will be deployed in environmental situations at a previously unavailable scale. We will carry out fundamental research into software engineering methods to manage, share, visualise and integrate the large and complex datasets. We will develop research data management and dissemination layers, underpinned by community standards, that provide the granularity and searchability of EI’s large-scale and diverse data outputs that we are generating, and integrate the statistical, machine-learning, and network-based models developed under this programme. We will also build semantic knowledge graphs annotated with ontology-based descriptions in order to represent the body of information gathered through harmonised data and network integration. These will comprise metadata that describe reusable interconnected research datasets, and we will feed these methods into appropriate information systems to enable national and international collaborative research through open multi-omics platforms. This will lay the foundations for an interconnected network of hardware and software to deliver real-time monitoring of crop development and pathogen detection and screening. Subsequent federation of these activities with national and international services will be facilitated through the EI NCG in e-Infrastructure (NC3), and interactions with the ELIXIR-UK community for pan-European life science infrastructures. This will also underpin the computational biology needs of the QIB Microbes in the Food Chain ISP in standardised and interoperable analysis systems.

Summary

unavailable
Committee Not funded via Committee
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file