AlphaFold Structure Prediction

Summary

  • What are we trying to achieve?
    We are trying to create accurate models of our final protein products. These models will help us verify that our sequences are correct, since we know of similar structures from literature, and these models will help us run many protein interactions simulations to infer insights about the behavior of our products.
  • Why is this the best method?
    AlphaFold 2.0 was released by DeepMind in 2020. To date, it is the best algorithm for protein structure prediction. DeepMind is known to have trained the program on over 170,000 proteins from a public repository of protein sequences and structures. The program uses a form of attention network, a deep learning technique that focuses on having the AI algorithm identify parts of a larger problem, then piece it together to obtain the overall solution.
  • What does this tell us?
    • IgY assembly, first ever predicted structure.
    • Predicting structures of the NeoFvs
    • Models for running further simulations..

Technicals

What is AlphaFold and how it works

AlphaFold is a protein structure prediction algorithm developed by DeepMind.

Its job is simple: predict the 3D structure of a protein, given a sequence. It is currently the most accurate method to predict structures of unknown proteins. It works by training specialized neural networks on evolutionary, physical and geometric constraints of protein structures[1], and uses the protein data bank (PDB) as a reference database.

AlphaFold, in simplest terms, is an input-output program: input a protein sequence, and get an output structure

Design of the AlphaFold 2.0 algorithm, reproduced from DeepMind’s article on the same topic.

DeepMind’s AlphaFold 2.0 was installed on IISER Pune’s PARAM Brahma supercomputer. We wrote slurm batch files to input our sequences and run AlphaFold on the GPUs of the supercomputer.

We generated multiple structure predictions through these.

To start our dry lab at the beginning of the cycle, where our aim was to propose avian antibody IgY as a therapeutic agent, we found that no crystal structure of IgY existed in literature and hence we ran it through AlphaFold’s multimer model to arrive at the structure for our then-proposed antibody.

AlphaFold was later useful to predict the structures of our FcRn binding peptide and our NeoFvs (scFvs + FcRnBPs). A representative NeoFv structure is shown here.

There are multiple ways to generate a NeoFv (which is an amalgamation of an scFv with an FcRn binding peptide). The three degrees of variation in a NeoFv are described in the table below.

Variation Representation Description Degrees of variation
Position VH-x-VL, x-VH-VL or VH-VL-x The FcRn binding peptide can either be located at the N-terminus, C-terminus or the flexible linker of the scFv. 3
Conformation Cyc or Lin The peptide can either have cysteines at positions 4 and 14 forming a disulfide bond, or have valine and alanine without forming a disulfide bond. 2
pH-dependence mutation Y12H or none mentioned The peptide can have either a tyrosine or a histidine at the 12th position. Introduction of histidine is hypothesized to give the peptide pH dependent binding to the FcRn receptor. 2

This generates a total of 12 variants. We predicted AlphaFold structures for all of them. In order to investigate which of these variants can have the best binding parameters, we used molecular docking and molecular dynamics approaches, for which these predicted structures were essential.

FcRnBPs are a beta-turn

While we were predicting the structures of our FcRn binding peptides, we noticed that one of our predicted structures disagreed with the assumption in existing literature: the linear version of this peptide is not linear[2].

While all of our predictions for the cyclic version (which possesses disulfide bonds) output a structure with 2 antiparallel beta strands with a disulfide bond on one side, one of our predictions for the linear peptide also shows an inverted beta strand. To explore this anomaly, we looked at the sequence of our FcRn linear binding peptide

QRFVTGHFGGLYPANG

We found out that the consensus sequence involved in binding have a distribution of glycine residues that occur typically in protein secondary structures called beta turns. We confirmed this by highlighting the hydrogen bonds in the turn, using UCSF Chimera Molecular Visualization Tool.

We hypothesized that this particular structure must be giving the sequence present in the loop (the consensus sequence[3]) the ability to bind to the FcRn receptor and hence we used this consensus sequence in our further docking simulations as active residues.

Molecular Docking

References

  1. Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). DOI
  2. Vince W. Kelly and Shannon J. Sirk. Short FcRn-Binding Peptides Enable Salvage and Transcytosis of scFv Antibody Fragments. ACS Chemical Biology 2022 17 (2), 404-413. DOI
  3. Mezo AR, McDonnell KA, Hehir CA, Low SC, Palombella VJ, Stattel JM, Kamphaus GD, Fraley C, Zhang Y, Dumont JA, Bitonti AJ. Reduction of IgG in nonhuman primates by a peptide antagonist of the neonatal Fc receptor FcRn. Proc Natl Acad Sci U S A. 2008 Feb 19;105(7):2337-42. DOI
    Epub 2008 Feb 12. PMID: 18272495; PMCID: PMC2268137.