Model

Modelling

Rationale

Our method of generating novel nanobodies relies heavily on the binding of green fluorescence proteins (GFPs) to a wide range of nanobodies. This was something we explored in the lab, with extensive in vitro testing of nanobody-GFP binding. However, there is a level of variability inherent to wet lab experiments, and exploration through alternative methodologies is essential to ensure the accuracy and credibility of experimental results. Hence, we elected to use molecular dynamics (MD) simulations to model the docking of all the nanobody-GFP complexes we tested in the lab. As well as helping to test the validity of our experimental findings, the results of MD simulations would greatly improve our understanding of the behaviour and structure of nanobody-GFP complexes. Furthermore, finding a methodology of accurately simulating nanobody-GFP binding would prove a useful tool in the screening of generated novel nanobodies we hope to produce.

Methodology

All modelling was performed in Python version 3.9.12 on a Linux based machine. Python offers a range of free and open-source packages which allow for the manipulation and simulation of proteins based on structural data, stored in a Protein Data Bank (PDB) file. We generated PDB files using “Phyre2” (Kelley et al. 2015), an online tool that predicts the folding and 3D structure of proteins based on an amino acid sequence. It total we generated 8 PDB files for:

We simulated the docking of each nanobody-GFP complex, meaning 12 MD simulations were performed in total.

LightDock

Our MD simulations were performed using LightDock version 0.9.3 (Jiménez-García et al. 2018). LightDock is an open source Python package, which allows for ab-initio protein-protein docking based on only 3D coordinates of two protein molecules. This is achieved via Glowworm Swarm Optimisation (GSO), an iterative process which continuously refines a model over an extended period of simulation.

The first step of a LightDock simulation is setup, and this is where most parameters of the model are set. The two most important parameters in a LightDock simulation are the number of swarms, with each swarm being an independent simulation starting at a different position around the receptor protein, and the number of glowworms, with each glowworm being a 3D axis object representing a possible ligand conformation site (Figure 1). Each swarm is composed of a number of glowworms determined by the user, and for an accurate simulation enough swarms and glowworms must be used to ensure exhaustive sampling of all possible ligand conformations. The default swarm and glowworm values in LightDock are 400 and 300 respectively, and based on a pilot run we determined this to be sufficient for exhaustive sampling of our nanobody-GFP complexes. We also used arguments in our LightDock setup to ignore OXT atoms, which decreased the runtime of our models without affecting results, and enabled backbone flexibility using an Anisotropic Network Model (ANM).

Plasmid map of backbone showing the cloning strategy.
Figure 1. The start of a LightDock simulation, with 400 glowworm swarms surrounding a receptor protein. Dots represent swarms, and the glowworms within swarms are highlighted for a single swarm at the top right of the figure. Arrows from glowworms show the three dimensions in which movement can occur between steps.

After the setup function was completed, it was time to run the model. During simulation, LightDock uses its GSO algorithm to refine the model in discrete steps. At the start of the model, swarms are distributed randomly around the receptor protein (as in Figure 1), and glowworms are distributed randomly within swarms. At each step, each glowworms moves to a better energetical position which improves its “score”, and in our case the “Dfire” scoring algorithm was used to define scores. This process leads to a movement and convergence of glow worms at each step to improve the stability and accuracy of models. For our models, we simulated models over the default 100 steps, as pilot runs of our model indicated that there was no benefit to increasing the number of steps any further. We also used the “min” argument to perform local minimisation of the best glowworm at each step. While this increased the model run time, our pilot runs showed this greatly improved the consistency of our models.

Once simulation was completed, we generated predicted models as PDB protein structure files. With 400 swarms and 300 glowworms, this led to the generation of 120,000 predicted models for each nanobody-GFP complex. We then performed intra-swarm clustering, which removed redundant models and ranked remaining models by Dfire score for each cluster. The retained models and rankings were then used in the next step of model refinement.

Haddock

After clustering LightDock models, we used HADDOCK version 2.4 (van Zundert et al. 2015) to perform further refinement. HADDOCK comprises a collection of Python scripts that perform structural calculations of protein complexes. We first used HADDOCK to remove clashes at protein interfaces, and then to assemble the top 20 models for each complex from across all generated swarms. These top 20 models were then clustered with HADDOCK a final time to locate the best model. HADDOCK clustering to find the best models was done based on energy minimisation, with lower binding energy being favoured.

Washington Collaboration

While modelling is an incredibly powerful and useful tool, it can be difficult to determine the accuracy of a model once completed. One way to solve this problem is to use multiple methods of modelling, and compare outputs. If multiple simulation programs converge at a similar model, then it is likely to be a robust result. Alternatively, if different models differ wildly in their output, it is hard to verify which, if any, is the most correct.

We therefore elected to collaborate with Washington iGEM, another iGEM team, who used “Rosetta” to repeat the MD simulations we performed in LightDock. The workflow used by Washington in Rosetta is outlined below.

For our side of the collaboration, we used “CABSdock” (Kurcinski et al. 2015), an open-source Python package, to simulate the docking between five peptides, provided to us by the Washington iGEM team, and BRAF, a protein involved in the development of skin cancer. CABSdock uses a coarse-grained protein simulation model, a method which allows for more efficient simulations with shorter runtimes. This package also supports ab initio simulation, and simulations were run based on only coordinate data of the BRAF protein in a protein databank (PDB) file, and amino acid sequences of the peptides.

Another advantage of CABSdock is that it can be run on a web-server, and multiple runs can be queued and performed simultaneously. We took advantage of the web-server, as it offered significantly shorter run times compared to running models locally. We ran the model with 100 simulation cycles, and left other parameters at default.


PyMOL Visualisation

Once the best models had been found from both LightDock and Rosetta, we input the PDB files of these models into PyMOL (Schrödinger and DeLano 2020), a molecular visualisation tool. The first thing we did was to locate interface residues, which we represented as sticks and coloured green. Next, we displayed hydrogen bonds between nanobody and GFP chains, which are represented by dashed lines. Numbers at these bonds represent polar contact distances. Finally, we added a transparent electrostatic surface around the complex, and coloured nanobodies yellow, fuGFP blue and sfGFP purple, making the identity of individual proteins in the complexes more clear.

ProtDCal Suite

After MD simulations were visualised, we used the “PPI-Affinity” tool (Romero-Molina 2022) in the “ProtDCal” suite to analyse the binding affinities of our nanobody-GFP complexes. We input a zip file with all our best simulations from LightDock, and PPI-Affinity output a score for each complex in terms of the difference in free energy between bound and unbound states.

Results

GIF animation of modelled sf-h in LightDock GIF animation of modelled fu-h in LightDock
Figure 2. sfGFP-Nanobody H complex (left) and fuGFP-Nanobody H complex (right) modelled in LightDock.
GIF animation of modelled sf-h from the Washington team in Rosetta GIF animation of modelled fu-h from the Washington team in Rosetta
Figure 3. sfGFP-Nanobody H complex (left) and fuGFP-Nanobody H complex (right) modelled in Rosetta.
GIF animation of modelled sf-2 in LightDock GIF animation of modelled fu-2 in LightDock
Figure 4. sfGFP-Nanobody 2 complex (left) and fuGFP-Nanobody 2 complex (right) modelled in LightDock.
GIF animation of modelled sf-2 from the Washington team in Rosetta GIF animation of modelled fu-2 from the Washington team in Rosetta
Figure 5. sfGFP-Nanobody 2 complex (left) and fuGFP-Nanobody 2 complex (right) modelled in Rosetta.
GIF animation of modelled sf-3 in LightDock GIF animation of modelled fu-3 in LightDock
Figure 6. sfGFP-Nanobody 3 complex (left) and fuGFP-Nanobody 3 complex (right) modelled in LightDock.
GIF animation of modelled sf-3 from the Washington team in Rosetta GIF animation of modelled fu-3 from the Washington team in Rosetta
Figure 7. sfGFP-Nanobody 3 complex (left) and fuGFP-Nanobody 3 complex (right) modelled in Rosetta.
GIF animation of modelled sf-6 in LightDock GIF animation of modelled fu-6 in LightDock
Figure 8. sfGFP-Nanobody 6 complex (left) and fuGFP-Nanobody 6 complex (right) modelled in LightDock.
GIF animation of modelled sf-6 from the Washington team in Rosetta GIF animation of modelled fu-6 from the Washington team in Rosetta
Figure 9. sfGFP-Nanobody 6 complex (left) and fuGFP-Nanobody 6 complex (right) modelled in Rosetta.
GIF animation of modelled sf-7 in LightDock GIF animation of modelled fu-7 in LightDock
Figure 10. sfGFP-Nanobody 7 complex (left) and fuGFP-Nanobody 7 complex (right) modelled in LightDock.
GIF animation of modelled sf-7 from the Washington team in Rosetta GIF animation of modelled fu-7 from the Washington team in Rosetta
Figure 11. sfGFP-Nanobody 7 complex (left) and fuGFP-Nanobody 7 complex (right) modelled in Rosetta.
GIF animation of modelled sf-AC in LightDock GIF animation of modelled fu-AC in LightDock
Figure 12. sfGFP-Nanobody AC complex (left) and fuGFP-Nanobody AC complex (right) modelled in LightDock.

Both LightDock and Rosetta predicted successful binding between all nanobody-GFP combinations. However, models differed significantly between the two programs. This is likely because we used an ab-initio approach, while Washington used data from a previously modelled complex to input likely binding sites into their model.

Despite this, the free-energy predictions from PPI-Affinity aligned with our experimental results, and showed that sfGFP-Nanobody AC and sfGFP-Nanobody H complexes had the greatest binding affinity (Table 1). Overall, it also showed that sfGFP binds more strongly to nanobodies than fuGFP.

Table 1. Binding affinities of Nanobody-GFP complex models generated using LightDock. Numbers represent free-energy scores in units ∆kcal/mol, with a lower score meaning greater binding affinity.
sfGFP fuGFP
Nanobody H -12.7 -10.7
Nanobody 2 -12 -11.4
Nanobody 3 -12.4 -12
Nanobody 6 -12.2 -11.8
Nanobody 7 -11.5 -11.2
Nanobody AC -12.9 -11.6

Team Washington Workflow for Protein Modeling Collaboration with Team Sydney Australia

Our goal for our modeling work of the collaboration was to dock several of team Sydney’s nanobodies to the GFP team Sydney provided us. We then analyzed which nanobody bound with the highest affinity to GFP. Team Sydney will then analyze whether our modeling results are similar to their simulations.

Rosetta is a command-line based software suite developed at the University of Washington. In order to dock the nanobodies to GFP, our team decided to follow and adapt a Rosetta tutorial on Protein-Protein Docking. Below is our workflow:

  1. Relaxed both the nanobody pdb and GFP pdb files provided to us by Team Sydney. Our proteins were relaxed using the following command, which was provided to our team by our PI Dr. Frank DiMaio.
    • /path/to/relax.static.linuxgccrelease -s fuGFP.pdb -relax:constrain_relax_to_start_coords - relax:coord_constrain_sidechains -relax:ramp_constraints false -missing_density_to_jump
    • /path/to/relax.static.linuxgccrelease -s nB_H.pdb -relax:constrain_relax_to_start_coords - relax:coord_constrain_sidechains -relax:ramp_constraints false -missing_density_to_jump
  2. In order to do the docking protocol, we needed to create a complex in which the nanobody we are docking is near the surface of GFP we expect it to bind to. Team Sydney provided our team with a .pdb file with one of the nanobodies near the proposed binding site of GFP. In pymol, we aligned our relaxed nanobody and GFP molecules to the provided complex so that they were in the right orientation. We then labeled the GFP protein as chain A and the nanobody as chain B using the pymol commands:
    • Alter GFP, chain = “A”
    • Alter nanobody, chain = “B”
    We then saved the correctly placed nanobody and GFP as complex.pdb.
  3. In our working directory, we added the two files below, which we adapted from the protein-protein docking tutorial, flag_local_docking and flag_local_refine. The contents of these files is located in the Supplementary information at the end of this document.
  4. We created folders within our working directory called input_files and output_files
  5. We put complex.pdb within the input_files folder
  6. We executed this command in command line
    • /path/to/docking_protocol.static.linuxgccrelease @flag_local_docking
    This command initiates the docking procedure. We chose to have Rosetta run 100 runs of docking the nanobody to GFP.
  7. The docking procedure produces 100 pdb files of docked structures in the output_files folder. We next ran a refinement procedure to further simulate the binding of the nanobody to the GFP. To run the refinement procedure, we first copied all of the pdb’s produced by docking procedure into the input_files folder. We then ran
    • /path/to/docking_protocol.static.linuxgccrelease @flag_local_refine
  8. From the refine step, we received pdb files of the refined structures in the output files. Additionally, the refine step produces a score file which contains score numbers that indicate the relative stability of the entire complex. We used the scores to identify what refined structure is the most stable.
  9. We then relaxed the most stable refine structure using the same command as described in step 1.
    • /path/to/relax.static.linuxgccrelease -s complex_local_dock_0001_local_refine_0001.pdb - relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints
    • false -missing_density_to_jump
  10. We then ran steps 1-9 for all the nanobodies that team Sydney provided to us, nB_H, nB_2, nB_3, nB_6, and nB_7. In order to compare which nanobody bound to GFP most stably, we needed to compare their ddg values. Ddg gives a measure of the energy of the binding interface and can be used to assess relative binding affinities. We were advised to generate ddg values by Dr. Frank DiMaio. Dr. Frank DiMaio provided us with an xml file that we used in order to use ddg values. We created a .sh file that we used to run the xml file. We named the files ddg.sh and ddg.xml. The content of these two files is listed in the supplementary information located at the end of this pdf.
  11. We then compared the ddg values of each nanobody and found that the nanobodies ranked in this order of binding affinity, from most stable to least stable: nB_6, nB_3, nB_2, nB_7, nB_H.

    Lower ddg values indicate high binding affinity.



References

Van Zundert, G.C.P., Melquiond, A.S.J., Bonvin, A.M.J.J., 2015. Integrative Modeling of Biomolecular Complexes: HADDOCKing with Cryo-Electron Microscopy Data. Structure, 23(5), 949–960.

Jiménez-García, B., Roel-Touris, J., Romero-Durana, M., Vidal, M., Jiménez-González, D., & Fernández-Recio, J., 2018. LightDock: a new multi-scale approach to protein-protein docking. Bioinformatics (Oxford, England), 34(1), 49–55.

Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., and Sternberg, M.J.E., 2015. The Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols, 10(6), 845–858.

Kurcinski, M., Jamroz, M., Blaszczyk, M., Kolinski, A., and Kmiecik, S., 2015. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic acids research, 43(W1), W419–W424.

Romero-Molina, S., Ruiz-Blanco, Y.B., Mieres-Perez, J., Harms, M., Münch, J., Ehrmann, M., and Sanchez-Garcia, E., 2022. PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein–Peptide and Protein–Protein Binding Affinity. Journal of Proteome Research, 21(8), 1829–1841.

Schrödinger, L. & DeLano, W., 2020. PyMOL, Available at: http://www.pymol.org/pymol.