Award details

BBSRC-NSF/BIO: Integrative analysis and Visualisation of Fly Cell Atlas datasets to enable cross-species comparisons

ReferenceBB/T014008/1
Principal Investigator / Supervisor Professor Nicholas Brown
Co-Investigators /
Co-Supervisors
Institution University of Cambridge
DepartmentPhysiology Development and Neuroscience
Funding typeResearch
Value (£) 487,279
StatusCurrent
TypeResearch Grant
Start date 01/04/2021
End date 31/03/2025
Duration48 months

Abstract

This proposal is comprised of three main aims: the first, will develop the computational analysis pipelines for scRNA-seq data in Drosophila melanogaster, including batch correction, cell clustering, marker gene detection, trajectory and differential analysis, in addition to cell type annotation. This will create standardised workflows which can be run across the different Fly Cell Atlas (FCA) datasets and the metadata will be curated with using ontology terms and genetic feature identifiers. The annotation stage will encourage and capture curation from the fly community by scientists with expertise in various tissues and cell types. The second, is to develop the fly-specific functionality of scExpression Atlas allowing easy identification of both the raw and processed data, as well as functionality to visualise cell type expression data in FlyBase and the Drosophila resources at Harvard University. Key to this is the enhancement of FCA data visualisation, gene set enrichment analysis tools will be developed and Anatomograms will be available as embeddable widgets allowing specific experiments to be easily embedded by different websites. Lastly, comparative analyses will be performed using datasets from FCA, Mouse Cell Atlas and the Human Cell Atlas. Orthologous relationships will be used to map genes from one species to another to generate a combined, integrated dataset. Different methodologies will be explored for dataset comparison and both mappings and ontologies will be extended and improved. The scExpression Atlas user interface will be further developed to enable users to interrogate the data cross species, in addition to analysis by cell type and tissue. In collaboration with the FCA community we will extend the scExpression Atlas APIs, download formats and associated software to allow data re-use and re-analysis and so promote Open Science.

Summary

The fruit fly, Drosophila melanogaster, has for the last century been fundamental to the study of genetics. It is used in many areas of research as the model organism of choice, as it provides the ability to study genetics in the laboratory and apply findings to human genetics. Its use as a model is due to two factors: First, its genetic code can be relatively easily manipulated in the laboratory and this coupled with a short life cycle, provides a means by which a gene or pathway function can be rapidly studied. Secondly, the vast majority of the fundamental biochemical mechanisms and pathways are conserved between fly and humans. Indeed, 75% of the genes that cause human disease are found in fly and, thus, the data collected in the fly can be used to provide insights into the same processes within humans. The emergence of a new technology, single cell RNA sequencing (scRNA-seq), has provided information as to which genes are switched on or most active from a single cell. Within the fly community this provides the ability to quickly map clusters of cells and cell types to the whole anatomy and link this to both phenotype and function. The increasing number of scRNA-seq datasets from different species has resulted in the development of the Single Cell Expression Atlas (scEA). This is a web portal which enables users to more easily visualise and interpret this data. It is anticipated that the level of fly single cell data will increase from 10 datasets to ~100 in 2020 and further two-fold increase in 2021. Key to the scientific exploitation of this data will be the ability of users to not only effectively analyse the fly data but also to examine the interconnections between fly data and human or mouse datasets. In this project we will provide the means by which fly datasets can be easily interpreted and also linked to mouse and human datasets via scEA. The scEA currently hosts scRNA-seq data for over 500K assays and this includes data for the Human Cell Atlas (HCA) and Mouse Cell Atlas (MCA), amongst others. This project will enable analysis pipelines to be developed to combine the available and emerging datasets, alongside the necessary computational infrastructure to host the Fly Cell Atlas (FCA) datasets. ScEA will provide users with an easy to navigate web service with exploratory querying capability, in addition to data download capabilities for further data analysis. The service will be fully integrated with the established fly resources, Flybase, Virtual Fly Brain and the Drosophila Resources at Harvard University. This project will also develop a process for annotation of the datasets. This annotation step adds additional scientific information to the data which provides the user with a greater level of biological understanding and so aids the interpretation and analysis. This annotation will expand on the existing FlyBase anatomy ontology which is a structure of controlled vocabularies used to describe the anatomy of the fly this will ensure that there is full compatibility across new and existing resources. The scEA will develop and provide the means by which the data can be easily visualised and mined for cell types, while also providing the fly community with the ability to contribute their scientific expertise to the annotation. The scEA user interface will be further developed to provide a greater level of cross species query ability as the resulting FCA will be linked within scEA to the HCA, MCA and any further datasets enabling cross species comparisons which will aid in the discovery of novel biological insights. This project aims to provide the fly community with practical solutions for connecting, re-using and reanalysing datasets and so will close the gap in translating biological discoveries in model organisms, such as the fruit fly, to humans and vice versa. This project will make the results of this comparative analysis rapidly available to the growing user community.

Impact Summary

The fruit fly, Drosophila melanogaster, has for the last century been fundamental to the study of genetics. It is used in many areas of research as the model organism of choice, as it provides the ability to study genetics in the laboratory and apply findings to human genetics. The vast majority of the fundamental biochemical mechanisms and pathways are conserved between fly and humans. Indeed, 75% of the genes that cause human disease are found in fly and, thus, fly data provide insights into the same processes within humans. The emergence of a new technology, single cell RNA sequencing (scRNA-seq), has provided information as to which genes are switched on or most active within a single cell. This data are generating fundamental new insights into how cells differentiate into specific cell types, and what a cell type represents at the molecular level. The increasing number of scRNA-seq datasets from different species encouraged us to develop the Single Cell Expression Atlas (scEA). This is a web portal which enables users to more easily access and interpret this data. It is anticipated that Drosophila single cell data will increase from 10 datasets to ~100 in 2020 and further two-fold increase in 2021. Key to the scientific exploitation of this data will be the effective analysis of the fly data and the ability to explore interconnections between fly data and human and mouse data. In this project we will provide the means by which fly datasets can be easily interpreted and linked to mouse and human datasets via scEA. This project will enable analysis pipelines to be developed to combine the available and emerging datasets, alongside the necessary computational infrastructure to host the Fly Cell Atlas (FCA) datasets. ScEA will provide users with an easy to navigate web service with exploratory querying capability, in addition to data download capabilities for further data analysis. The service will be fully integrated with the established fly resources, FlyBase and the Drosophila Resources at Harvard University. Data sets and derived analysis results will be easily accessible in standard formats to be reused by: (1) wet-lab biologists investigating new experimental hypotheses, and comparing published datasets to their own new results; (2) computational biologists engaged in new development of analysis tools or machine learning applications where access to well curated and standardised data sets is essential. The availability of the combined Fly Cell Atlas through user-friendly interfaces at Harvard and EMBL-EBI will contribute greatly to all projects investigating transcription at the single cell level. By providing molecular signatures of each cell type, the Fly Cell Atlas data will aid the identification of the cell types altered when genes are mutated, including models of human diseases. Mapping cell types across species will permit verification of the similarities in the underlying cellular defects caused by loss of similar gene function in human and fly. Establishing a robust atlas of cell types in Drosophila will also aid projects aimed at controlling insects that are vectors of disease or agricultural pests, by providing basic knowledge of the cell types that can be used to target novel control strategies. With a rise in pesticide resistance and the negative environmental impact of pesticides, the understanding of Drosophila biology underpins development of new strategies. Functional interpretation of the genomes of disease-carrying insects and crop pests relies heavily on the extensive experimental data from Drosophila. Methods developed within this project will be applicable to biological and bioinformatics communities beyond researchers working in fly, mouse or human. With the dissemination of analysis tools in containerised form and their availability in public registries, we expect their usage to expand over a wider spectrum of computational biologists.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsTechnology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file