Project overview
Macromolecules such as proteins, DNA and RNA mediate the vast majority of processes that constitute and sustain life, including photosynthesis, metabolism, exchange of information between cells, and cellular replication. These processes depend crucially on the dynamic 3D structures that macromolecules adopt. Insights from studying the 3D structures of macromolecules have transformed both our understanding of living systems and our ability to use that understanding to promote health and use in biotechnology. Overwhelmingly, 3D structures were experimentally determined by macromolecular crystallography (MX, >85% of PDB entries), with additional contributions from nuclear magnetic resonance (NMR; dynamic, but typically limited to smaller structures) and rapidly growing input from electron microscopy (EM; typically membrane protein structures and macromolecular complexes, but studied under cryogenic conditions). The database of over 200,000 experimental structures is now informing deep learning approaches to predict macromolecular structure. In this exciting scientific landscape, MX plays a central discovery role, as well as validating and improving structure predictions, and complementing capabilities of other techniques. The present proposal deals with current challenges in the computational aspect of MX. In the process of MX, a crystal, formed from billions of copies of the macromolecule, is used to diffract X-rays or electrons; computational techniques then determine the underlying atomic structure. Knowledge of the molecule's 3D structure not only allows us to understand its function but also critically to design chemicals to interfere with it. Pharmaceutical research depends on the accuracy of experimental structures as the basis for designing drugs to turn the molecules on or off, or tune their function when required. Whilst the experimental pipeline today is partially automated and thus tremendously successful, key future challenges remain. Interactions of macromolecules with small and/or other macromolecules change their structure in ways that help to explain their function. There is now an opportunity to improve the strategies by which we capture and describe the family of structures that a macromolecule can adopt, especially with room-temperature methods. It is timely to develop tools that allow identification of different structural states present in a single experiment or in sets of related experiments. By recasting MX as a multi data-multi model process, this proposal will address a weakness that has previously limited the ability of MX to define the dynamics of macromolecules, and so to infer and predict functional properties. The work proposed here will improve structure analysis from both X-ray diffraction - the current predominant technique - and electron diffraction - a technique that can work with far smaller crystals, and so extend the utility of MX. Moreover, we propose to harness the power of deep learning approaches into the process of structure determination and validation for proteins, carbohydrates, DNA and RNA, as well as complexes containing one or more of these molecule-types. With a multi-technique, multi-data and multi model approach, we aim to deliver a dynamic description of the macromolecules that is closer to life, and therefore more descriptive of their function. The Collaborative Computing Project 4 was established in 1979 and continues to underpin world class macromolecular structural science in the UK. Effective use of data collected at synchrotron, XFEL and electron microscopy facilities is at the heart of the project's mission. User communities benefiting from such research include academics as well as industries. At the interface of the two, CCP4 enables discoveries that underlie vaccine and therapy discovery (including therapies and vaccines for SARS-CoV-2) and may equally be applied to tackle modern challenges in biotechnology and adaptation to climate change.