Award details

Developing Artificial Intelligence and Deep Learning for the analysis of correlation spectroscopy data

ReferenceBB/T011831/1
Principal Investigator / Supervisor Professor Flemming Hansen
Co-Investigators /
Co-Supervisors
Institution University College London
DepartmentStructural Molecular Biology
Funding typeResearch
Value (£) 133,225
StatusCompleted
TypeResearch Grant
Start date 01/03/2020
End date 28/02/2021
Duration12 months

Abstract

Non-uniformly sampled (NUS) NMR spectra has become important for obtaining ultra-high dimensional NMR spectra with high resolution. Being able to accurately reconstruct NUS NMR spectra allows for high-dimensional NMR spectra to characterise macromolecular machines, amongst others. Various algorithms have been developed to reconstruct the full dataset from sparsely sampled data, however, these are often slow and require many more sampled points than what theoretically is necessary. The development of deep neural networks (DNN) have seen an impressive growth recently with many and highly different applications. The current biggest challenge for training complex DNNs is the availability of sufficient training data. Reconstruction and analysis of high-dimensional NUS NMR data are well suited for DNNs, because sufficient training data can easily be generated. It is therefore now time to take advantage of the immense potential of deep learning for the analysis of complex NMR spectra. During the proposed research project DNNs will be designed and trained to analyse high-dimensional NMR spectra, such as, 3D triple resonance spectra and 4D methyl-methyl NOESY spectra. Whereas initial objectives aim at 'old-style' reconstruction, subsequent objectives aim at providing a multi-way decomposition of sparsely sampled high-dimensional NMR spectra. Such decompositions contain the same information as the fully reconstructed spectrum, but the information is concentrated and kept in shapes. Because of the high flexibility of deep learning and since a sufficient amount of training data can be produced, one can start to aim at entirely new perspectives for the analysis of NMR spectra where, for example, a trained DNN performs the entire analysis of a series of biomolecular NMR spectra in one single step. An objective is also to perform a combined analysis of triple-resonance NMR spectra to provide chemical shifts assignments of proteins - quickly, robustly and in a single step.

Summary

Nuclear magnetic resonance (NMR) spectroscopy is an unprecedented technique to obtain detailed information - at atomic resolution - about macromolecular machines in an environment similar to the cell. NMR spectroscopy has therefore become an imperative tool for the characterisation of large proteins, for the discovery of new molecular interactions, and also for the discovery of new drug-leads. Large proteins and macromolecular machines contain thousands of atoms. High-dimensional (3D, 4D, ..) NMR spectra are therefore required in order to separate the NMR signals for the individual atoms and to facilitate a characterisation large proteins. A common hurdle with high-dimensional NMR spectra is the time required to obtain these, because essentially a 1D NMR spectrum is required for each point within a 100x100 square (3D spectra) or within a 100x100x100 cube (4D spectra). This makes it very time-consuming and nearly impossible to obtain high-dimensional (> 3D) spectra for large proteins. During the proposed project we will leverage the immense power and strength of Artificial Intelligence (AI) and Deep Learning to allow for fast acquisition of high-dimensional NMR spectra to characterise macromolecular machines. We will develop a new tool, where a new deep neural network will be designed and trained so that the required information about the macromolecule can be extracted orders of magnitude faster than using the traditional workflow because only a fraction of the points are recorded. For 4D spectra only about 1% of the points within the 100x100x100 cube need to be sampled. This is possible because the theoretical background of NMR spectroscopy is so well defied that sufficient training data easily can be generated to train the neural networks. The new tools, anchored in deep learning and AI, will not only allow for fast and accurate characterisation of molecular interactions but will also facilitate ultra-high-dimensional NMR that will allow for completely new NMR ventures and for even larger molecular machines to be characterised.

Impact Summary

The outcome of our proposed research is in the form of new Deep Neural Networks (DNNs) with associated optimised parameters to fast and reliably extract information from ultra-sparsely sampled high-dimensional NMR spectra. Different mechanisms will be in place for disseminating our new tools to the identified beneficiaries. Firstly, we will publish our new deep learning analysis tools and their applications in archives such as arXiv.org and bioRxiv.org and in peer-reviewed international journals with as high impact as possible. Such publications will be available via open access to the international research community, including academia and industry, and to the general public. As we have done previously, we will aim at combining our developments with applications to challenging biological or biochemical problems, since publications of this combined nature generally will appeal to a broader audience and better showcase the strength of the developed tools. Both the PI and the PDRA will also present the results at national and international meetings; such meetings also allow for in-depth conversations that promote new collaborations. Specific impact: Researchers from both academia and the industrial sector in fields of structural biology, protein-ligand interactions, and NMR spectroscopy will benefit directly from our research. This group of researchers can immediately incorporate our new analysis tools into their own research programme. Specifically, our new tools will allow for substantially faster acquisition of NMR data because ultra-sparse sampling can be used, which in turn will make workflows faster and more efficient. Importantly, the automated chemical shift assignment tools, which we will develop, will allow for non-expert NMR spectroscopists to easily analyse protein-NMR spectra and obtain chemical shift assignments of proteins. These assignments can subsequently be used to determine structure, dynamics and quantify interactions, such as drug-protein interactions. It is anticipated that the proposed DNNs for automated analysis of sparsely sampled NMR spectra will be particularly beneficial in industrial settings. The research proposed will also have a significant impact on the fields of artificial intelligence and deep learning. One of the biggest challenges for developing new neural network architectures is training data and having sufficient data to properly train and cross-validate a new network architecture. Using deep learning to analyse and decompose high-dimensional NMR spectra provides a unique example where (i) sufficient training data can easily be generated, since the theory behind NMR spectroscopy is well known and (ii) large amount of experimental cross-validation data can be generated using standard NMR experiments. Thus, the case of analysis of NMR spectra can form a robust example where new and even more elaborate deep learning architectures can be designed, improved, and cross-validated. The proposed research will indirectly benefit the general public. For example, our previous NMR-based tools and methods were used to improve the stability of coagulation factor VIII for the treatment of haemophilia A, which is of great societal impact. It is highly likely that the tools developed during the proposed research will both make analysis of NMR spectra faster and also applicable to a broader audience. NMR spectroscopy is used in many fields of science, including, material science, chemistry, and drug-discovery. Our new tools will have the ability to impact on all areas where NMR spectroscopy is used, which subsequently will benefit the general public.
Committee Not funded via Committee
Research TopicsStructural Biology, Technology and Methods Development
Research PriorityX – Research Priority information not available
Research Initiative Tools and Resources Development Fund (TRDF) [2006-2015]
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file