Rationale
Our method of generating novel nanobodies relies heavily on the binding of green fluorescence proteins (GFPs) to a wide range of nanobodies. This was something we explored in the lab, with extensive in vitro testing of nanobody-GFP binding. However, there is a level of variability inherent to wet lab experiments, and exploration through alternative methodologies is essential to ensure the accuracy and credibility of experimental results. Hence, we elected to use molecular dynamics (MD) simulations to model the docking of all the nanobody-GFP complexes we tested in the lab. As well as helping to test the validity of our experimental findings, the results of MD simulations would greatly improve our understanding of the behaviour and structure of nanobody-GFP complexes. Furthermore, finding a methodology of accurately simulating nanobody-GFP binding would prove a useful tool in the screening of generated novel nanobodies we hope to produce.
Methodology
All modelling was performed in Python version 3.9.12 on a Linux based machine. Python offers a range of free and open-source packages which allow for the manipulation and simulation of proteins based on structural data, stored in a Protein Data Bank (PDB) file. We generated PDB files using “Phyre2” (Kelley et al. 2015), an online tool that predicts the folding and 3D structure of proteins based on an amino acid sequence. It total we generated 8 PDB files for:
We simulated the docking of each nanobody-GFP complex, meaning 12 MD simulations were performed in total.
LightDock
Our MD simulations were performed using LightDock version 0.9.3 (Jiménez-García et al. 2018). LightDock is an open source Python package, which allows for ab-initio protein-protein docking based on only 3D coordinates of two protein molecules. This is achieved via Glowworm Swarm Optimisation (GSO), an iterative process which continuously refines a model over an extended period of simulation.
The first step of a LightDock simulation is setup, and this is where most parameters of the model are set. The two most important parameters in a LightDock simulation are the number of swarms, with each swarm being an independent simulation starting at a different position around the receptor protein, and the number of glowworms, with each glowworm being a 3D axis object representing a possible ligand conformation site (Figure 1). Each swarm is composed of a number of glowworms determined by the user, and for an accurate simulation enough swarms and glowworms must be used to ensure exhaustive sampling of all possible ligand conformations. The default swarm and glowworm values in LightDock are 400 and 300 respectively, and based on a pilot run we determined this to be sufficient for exhaustive sampling of our nanobody-GFP complexes. We also used arguments in our LightDock setup to ignore OXT atoms, which decreased the runtime of our models without affecting results, and enabled backbone flexibility using an Anisotropic Network Model (ANM).
After the setup function was completed, it was time to run the model. During simulation, LightDock uses its GSO algorithm to refine the model in discrete steps. At the start of the model, swarms are distributed randomly around the receptor protein (as in Figure 1), and glowworms are distributed randomly within swarms. At each step, each glowworms moves to a better energetical position which improves its “score”, and in our case the “Dfire” scoring algorithm was used to define scores. This process leads to a movement and convergence of glow worms at each step to improve the stability and accuracy of models. For our models, we simulated models over the default 100 steps, as pilot runs of our model indicated that there was no benefit to increasing the number of steps any further. We also used the “min” argument to perform local minimisation of the best glowworm at each step. While this increased the model run time, our pilot runs showed this greatly improved the consistency of our models.
Once simulation was completed, we generated predicted models as PDB protein structure files. With 400 swarms and 300 glowworms, this led to the generation of 120,000 predicted models for each nanobody-GFP complex. We then performed intra-swarm clustering, which removed redundant models and ranked remaining models by Dfire score for each cluster. The retained models and rankings were then used in the next step of model refinement.
Haddock
After clustering LightDock models, we used HADDOCK version 2.4 (van Zundert et al. 2015) to perform further refinement. HADDOCK comprises a collection of Python scripts that perform structural calculations of protein complexes. We first used HADDOCK to remove clashes at protein interfaces, and then to assemble the top 20 models for each complex from across all generated swarms. These top 20 models were then clustered with HADDOCK a final time to locate the best model. HADDOCK clustering to find the best models was done based on energy minimisation, with lower binding energy being favoured.
Washington Collaboration
While modelling is an incredibly powerful and useful tool, it can be difficult to determine the accuracy of a model once completed. One way to solve this problem is to use multiple methods of modelling, and compare outputs. If multiple simulation programs converge at a similar model, then it is likely to be a robust result. Alternatively, if different models differ wildly in their output, it is hard to verify which, if any, is the most correct.
We therefore elected to collaborate with Washington iGEM, another iGEM team, who used “Rosetta” to repeat the MD simulations we performed in LightDock. The workflow used by Washington in Rosetta is outlined below.
For our side of the collaboration, we used “CABSdock” (Kurcinski et al. 2015), an open-source Python package, to simulate the docking between five peptides, provided to us by the Washington iGEM team, and BRAF, a protein involved in the development of skin cancer. CABSdock uses a coarse-grained protein simulation model, a method which allows for more efficient simulations with shorter runtimes. This package also supports ab initio simulation, and simulations were run based on only coordinate data of the BRAF protein in a protein databank (PDB) file, and amino acid sequences of the peptides.
Another advantage of CABSdock is that it can be run on a web-server, and multiple runs can be queued and performed simultaneously. We took advantage of the web-server, as it offered significantly shorter run times compared to running models locally. We ran the model with 100 simulation cycles, and left other parameters at default.
Once the best models had been found from both LightDock and Rosetta, we input the PDB files of these models into PyMOL (Schrödinger and DeLano 2020), a molecular visualisation tool. The first thing we did was to locate interface residues, which we represented as sticks and coloured green. Next, we displayed hydrogen bonds between nanobody and GFP chains, which are represented by dashed lines. Numbers at these bonds represent polar contact distances. Finally, we added a transparent electrostatic surface around the complex, and coloured nanobodies yellow, fuGFP blue and sfGFP purple, making the identity of individual proteins in the complexes more clear.
ProtDCal Suite
After MD simulations were visualised, we used the “PPI-Affinity” tool (Romero-Molina 2022) in the “ProtDCal” suite to analyse the binding affinities of our nanobody-GFP complexes. We input a zip file with all our best simulations from LightDock, and PPI-Affinity output a score for each complex in terms of the difference in free energy between bound and unbound states.
Results
Both LightDock and Rosetta predicted successful binding between all nanobody-GFP combinations. However, models differed significantly between the two programs. This is likely because we used an ab-initio approach, while Washington used data from a previously modelled complex to input likely binding sites into their model.
Despite this, the free-energy predictions from PPI-Affinity aligned with our experimental results, and showed that sfGFP-Nanobody AC and sfGFP-Nanobody H complexes had the greatest binding affinity (Table 1). Overall, it also showed that sfGFP binds more strongly to nanobodies than fuGFP.
sfGFP | fuGFP | |
---|---|---|
Nanobody H | -12.7 | -10.7 |
Nanobody 2 | -12 | -11.4 |
Nanobody 3 | -12.4 | -12 |
Nanobody 6 | -12.2 | -11.8 |
Nanobody 7 | -11.5 | -11.2 |
Nanobody AC | -12.9 | -11.6 |
Team Washington Workflow for Protein Modeling Collaboration with Team Sydney Australia
Our goal for our modeling work of the collaboration was to dock several of team Sydney’s nanobodies to the GFP team Sydney provided us. We then analyzed which nanobody bound with the highest affinity to GFP. Team Sydney will then analyze whether our modeling results are similar to their simulations.
Rosetta is a command-line based software suite developed at the University of Washington. In order to dock the nanobodies to GFP, our team decided to follow and adapt a Rosetta tutorial on Protein-Protein Docking. Below is our workflow:
Lower ddg values indicate high binding affinity.
References
Van Zundert, G.C.P., Melquiond, A.S.J., Bonvin, A.M.J.J., 2015. Integrative Modeling of Biomolecular Complexes: HADDOCKing with Cryo-Electron Microscopy Data. Structure, 23(5), 949–960.
Jiménez-García, B., Roel-Touris, J., Romero-Durana, M., Vidal, M., Jiménez-González, D., & Fernández-Recio, J., 2018. LightDock: a new multi-scale approach to protein-protein docking. Bioinformatics (Oxford, England), 34(1), 49–55.
Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., and Sternberg, M.J.E., 2015. The Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols, 10(6), 845–858.
Kurcinski, M., Jamroz, M., Blaszczyk, M., Kolinski, A., and Kmiecik, S., 2015. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic acids research, 43(W1), W419–W424.
Romero-Molina, S., Ruiz-Blanco, Y.B., Mieres-Perez, J., Harms, M., Münch, J., Ehrmann, M., and Sanchez-Garcia, E., 2022. PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein–Peptide and Protein–Protein Binding Affinity. Journal of Proteome Research, 21(8), 1829–1841.
Schrödinger, L. & DeLano, W., 2020. PyMOL, Available at: http://www.pymol.org/pymol.