Award details

Leveraging functional profiling datasets with machine learning to uncover proteins and cellular processes important for ageing

ReferenceBB/R009597/1
Principal Investigator / Supervisor Professor Jurg Bahler
Co-Investigators /
Co-Supervisors
Professor Christine Orengo, Professor John Shawe-Taylor
Institution University College London
DepartmentGenetics Evolution and Environment
Funding typeResearch
Value (£) 779,213
StatusCompleted
TypeResearch Grant
Start date 01/04/2018
End date 30/09/2022
Duration54 months

Abstract

We want to establish comprehensive sets of proteins and biological processes involved in cellular ageing in fission yeast. This project combines functional-profiling experiments (large-scale phenotyping and genetic-interaction assays) with powerful new machine-learning prediction algorithms. Our integrated approach will benefit from iterated computational predictions and experimental validation, with different techniques used in two computational stages to get the most from the rich experimental data. The first computational stage will apply Bayes Multiple Kernel Learning, informed by phenotyping and heterogeneous network/homology datasets, to rank proteins based on their predicted associations with 116 ageing-associated proteins that we recently identified. In each of 5 iterations, we will test 50 top-ranked proteins for altered lifespans in the corresponding mutants to improve the predicted ranking in the next iteration. We will then screen the top-125 validated ageing proteins for genetic interactions using Synthetic Genetic Array analyses. In the second computational stage, we will exploit our functional-profiling data, integrated with in-house homology and network data, to build deep-learning predictors for GO Biological Processes relevant to ageing. CAFA-2 recently ranked our CATH homology-based predictor top; CATH is unique in providing functional sub-families that outperform Pfam for functional purity. Combining this unique data with the functional-profiling data generated in this project will enhance the power of our predictors optimized for ageing-related processes. Deep learning is computationally expensive, but advances in computing (e.g. GPUs are ~15x faster than CPUs) and efficient code bases (e.g. TensorFlow) are helping in this respect. Furthermore, we have access to the JADE Centre for Deep Learning Computation, which provides excellent computational facilities to speed up our training and investigation of many architectures.

Summary

Ageing is the largest risk factor for most human diseases in developed countries, including progressive diseases such as Alzheimer's and Parkinson's, diseases like cancer that show variable rates of onset, and catastrophic system failures such as heart-attack and stroke. While the study of specific disease processes has long been a major focus of research, there is a growing realization of the importance of studying the normal ageing process itself as an essential part of the problem, and of exploring ways to slow or reverse its effects. Ageing is a multi-factorial process that can be seen as an inevitable feature of the ravages of time. Recent discoveries, however, demonstrate that ageing can be modified in dramatic ways by simple interventions. For example, single gene knockouts can delay ageing and improve health late in the life of laboratory animals. The processes involved in ageing are similar in different organisms, and genetic mutations affecting these processes are associated with longevity in humans. A central challenge of ageing research, however, remains to tease out a complete and unified picture of the biological factors and processes determining lifespan. Ageing is highly complex and affected by diverse proteins and processes. Modern biological assays can simultaneously measure properties and interactions of thousands of proteins or genes, but it is challenging to make sense of such large datasets. Advances in computational data-analysis methods, called 'machine learning', provide exciting opportunities to get the most from large biological datasets and thus increase our understanding of complex processes like ageing. Machine Learning can find hidden patterns in data that is too complex for humans to process. Advances in computer power, algorithms and data sizes allow recent machine-learning architectures (known as 'deep learning') to accurately find and classify intricate patterns in combined datasets of different types. We plan to use fission yeast as a model organism, together with multi-step machine learning, to comprehensively identify biological processes with fundamental importance for ageing. Remarkably, many of these processes are similar from yeast to human, but are much easier to study in the simple yeast. Yeast cells enter a dormant, non-dividing state under limiting nutrients. Such dormant cells provide a useful system to analyse proteins and processes affecting the lifespan in this state. In previous studies, we have identified 116 proteins that, when absent, allow the yeast to live longer (long-lived knockout mutants). So these proteins are involved in ageing, and can be used to train machine-learning programs to predict new ageing proteins by a method known as 'guilt by association'. We will combine large systematic data on mutant features (phenotypes) with diverse existing data to empower the machine-learning predictor. We will test the predicted ageing proteins in the laboratory for lifespan effects in yeast, and feed this information back to the computer for it to learn more about ageing proteins. We will then use mutants of the new ageing proteins identified by the computer and confirmed in yeast to measure links with all other mutants. Such 'genetic-interaction' data provide rich information on functional relationships, which will be used to explore other, potentially more powerful deep-learning methods to predict the biological processes that are involved in ageing. We will then test the most attractive predictions with laboratory experiments. Moreover, we will make all the new data, methods and predictions available to interested scientists to help with their research. We anticipate that this project, using intimate cycles of experiments and machine-learning, will provide a valuable platform to better understand all the biological factors involved in ageing, to eventually develop interventions that extend healthy lifespan in humans.

Impact Summary

Who will benefit from this research? This proposed research is basic by its nature, and the immediate impacts from this work relate to scientific and knowledge advancement and the development of skills, capacity and capability. In the longer term, this research has the potential to impact areas of wealth and health. Beneficiaries beyond academia therefore are the commercial private sector and the wider public. How will they benefit from this research? The proposed research takes state-of-the-art experimental and computational approaches to address fundamental questions relating to biological processes involved in ageing. The research will deliver increased capacity and capability in strategically relevant areas of genomics and machine learning, through the provision of inter-disciplinary training and the further development of key methods and resources. Establishment of these methods is significant as they have a wide range of applications that reach beyond basic science into fields relating to human healthy ageing, the commercial (pharmaceutical) sector and beyond. The commercial sector might benefit by recruiting highly skilled and experienced scientists trained through this project. Ultimately, the pharmaceutical sector will clearly benefit from all the publicly available experimental data that we will make available through the DeepAge resource. They might also benefit by exploiting fresh drug targets (ageing-associated proteins that slow ageing when down-regulated), to effectively reduce the effects of ageing as the major risk factor for multiple diseases. The ageing population is a huge and increasing problem in our society, with enormous cost implications due to the economic and social burden of the rise in associated diseases and diminished quality of life for both patients and carers. It is evident that any measures that promote healthy ageing will be of massive, broad ranging benefit to our society with respect to economy, quality of life, health and creative output. In the longer term, the general public may thus benefit from our fundamental contribution to the understanding of genetic mechanisms and universal principles involved in ageing-related phenotypes that will guide and empower research in more complex systems and may help to develop safe broad-spectrum, preventative measures against age-associated diseases. Immediate and concrete deliverables with respect to impact beyond academia will be in public engagement, which we recognize as an important responsibility of scientists. We already have experience and established links that will facilitate good communication and public engagement of the research outputs. Details of our specific plans and timelines with respect to public engagement are outlined in the Pathways to Impact.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsMicrobiology
Research PriorityX – Research Priority information not available
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file