Introduction

Modeling comprised a key element of our project. At each stage of our project, modeling helped us answer critical questions that informed our research. Our modeling efforts consisted of two complementary modalities: mathematical modeling and molecular modeling. These tools helped us determine the feasibility of AMP encapsulation, the effect of AMPs on bacterial growth rate, and the viability of creating AMP polymers to enhance our yield.

Mathematical Modeling

The first of our modeling modalities was mathematical modeling. As part of our project, we constructed a model consisting of a set of differential equations and parameters to answer key questions and ascertain the feasibility of our project.

Phase 1: Modeling AMP and Encapsulin Expression and Encapsulation to Determine Feasibility of Encapsulation

The first phase of our modeling was building a model to capture the kinetics of antimicrobial peptide (AMP) and encapsulin expression, AMP encapsulation, and encapsulin self-assembly. To do so, we established a set of pseudo-reaction equations describing encapsulin encapsulation and self-assembly according to best-elucidated mechanisms [1]. In this model, the encapsulin monomer (E) and AMP (P) are transcribed and translated in tandem at rates proportional to their residue length. The encapsulin monomer and AMP then form a dimer as a consequence of the targeting peptide on the AMP [1]. The pairs then assemble into the final fully-assembled encapsulin complex (C).

Reactions:

Equations:

To determine parameter values, we relied on empirical values from the literature and deductions based on the following assumptions:

The AMP and encapsulin are transcribed and translated quickly by the cell
The AMP and encapsulin quickly reach steady state values
The AMP-encapsulin monomer dimers form at a slower rate than expression
The fully-assembled complex assembles as the encapsulin monomers are produced at sufficient quantities

Table 1: Kinetics Model Variables

Table 2: Kinetics Model Parameters

This model produced the following plots depicting each variable as a function of time over 500 minutes:

Figure 1: Predicted kinetics of AMP and encapsulin production and self-assembly

Here, we see the encapsulin monomer and AMPs quickly reach steady state levels. The AMP-encapsulin monomers then slowly rise in number before plateauing. Simultaneously, the fully-assembled encapsulin complex rises linearly and never reaches a steady state value. Based on these results, the most important factor in mitigating AMP toxicity would be the targeting peptide's efficiency. These results also suggest that AMP encapsulation is feasible, provided the AMP and encapsulin monomers quickly reach steady-state values [1].

Phase 2: Modeling Bacterial Growth Rate Models to Determine Effect of AMP Toxicity on Bacterial Growth

The next phase involved determining the effect of AMP expression on bacterial growth rates. One of the critical questions for our project surrounded the toxicity of the AMP on bacterial growth. Here, we sought to find that out mathematically. First, we determined parameters by matching the model to experimental data generated from our preliminary growth assay. There were three experimental groups of E. coli with transformed plasmids: negative control, which was a plasmid expressing only the encapsulin, experiment, a plasmid with the encapsulin and AMP, and positive control, which was a plasmid with only an AMPl. For mathematical modeling, we modeled the experiment and positive control groups.

We modeled growth rate as a function of several variables that differed depending on which experiment group was being modeled. First, growth rate is positively proportional to substrate concentration in accordance with Monod kinetics [2]. Second, growth rate is negatively proportional to inducer concentration [3] as inducers of heterologous gene expression lower bacterial growth by diverting resources toward gene expression. This is modeled using Michaelis-Menten kinetics.

Then, a third term was added to account for the negative impact on growth rate by unsequestered AMP that in its free form is bactericidal. For the experiment case, this AMP variable is modeled as in Phase 1 where as it's expressed, it's also sequestered by the encapsulin monomer. This results in the following model:

In the positive control case (solely AMP), the third term is modified to model AMP still being expressed but no longer being sequestered by the encapsulin monomer.

While some parameters were known, others were not, so the unknown parameters were estimated by finding parameters that matched the bacterial growth rate curves to experimental results from the in-lab growth assay. Here, the ratio of the two groups’ biomass at t = 400 minutes was used, resulting in the following graph:

Figure 2: Predicted E. coli biomass growth for experiment and positive control group. Positive control grows less (due to AMP toxicity) but experiment is able to attenuate this due to the encapsulin mitigating the AMP toxicity

This modeling showed that AMP toxicity does indeed impede bacterial growth, but that encapsulins are able to mitigate this toxicity, leading to better growth and therefore higher AMP yield. It also showed that inducers like IPTG lower growth rate by diverting cellular resources toward gene expression. This influenced our wet lab work in that we decided to lower our IPTG concentration to improve bacterial growth during our growth assay. The following tables describe the variables and parameters used in the growth rate models.

Table 3: Growth Rate Model Variables

Table 4: Growth Rate Model Parameters

Phase 3: Determination of Maximum AMP Polymer Size

With our model constructed and matched to experimental data, we then sought to model the effect of chaining multiple AMPs together in a single polymer to increase our yield. This would help to inform the next phase of in-lab experiments as we moved from AMP monomers to AMP polymers. To model this, we made the assumption that with more AMPs chained together, the peptide toxicity term would increase and the targeting peptide efficiency would decline. These effects are captured by the variable alpha, which represents the number of AMPs in an AMP polymer.

To test the effect of adding multiple AMPs in sequence, we raised the term and observed the resulting growth curves, as seen in Figure 3.

Figure 3: Effect of varying AMP polymer size on biomass growth rates. The experiment groups are represented by the solid lines and the positive control group are represented by the dotted lines.

We found the model to predict significant adverse effects on growth rate from raising the AMP count for both experiment and positive control group (Figure 3). Further, the difference between the groups diminished with increasing AMP polymer size. The model predicted that up to 4 AMPs could be chained together before net growth falls below zero. This information was valuable for determining an estimate of the maximum experimental yield we could reasonably expect.

Limitations: Mathematical Model Modification as a Prediction Tool

As with most models, one of the major limitations throughout all of our modeling is the applicability of our results. These limitations come from two main sources: our parameters and the simplifications made in our model. As many of our parameters are estimated from similar physical constants, our model will never be able to perfectly predict real world experimentation. While there are steps we could take to develop wet-lab experiments that would allow us to find more accurate parameter values, these steps would quickly exceed our team’s time and financial resources. Additionally, we are basing the majority of our modeling on mass-action kinetics, which does not encompass all of the complexity present in protein interactions. Our system also only simulates a handful of proteins when in reality a cell contains thousands of proteins, many of which interact. In reality, there is no way that we could feasibly integrate the complexity of our system's dynamics into a perfectly accurate model, even with unlimited resources.

Taking inspiration from statistician George E. P. Box's famous quote “All models are wrong, but some are useful,” we wanted to take steps to make our model more useful, even if it is wrong. We felt that this could be achieved by building a dynamic model that can give insight to the behavior of our encapsulin-AMP system without being as restricted by finding parameter values. This dynamic model would allow us to see how changing parameter values affects our system in real time, giving insight into how altering the characteristics of our AMP, Encapsulin, and E. coli would affect our overall process.

Differential Equations:

With this model, we have incorporated the basic dynamics of our encapsulin-AMP system, while also including degradation terms that we would not be able to measure in the lab. We are now able to dynamically modify the relative size of these parameters. To give an example of the model in action, we utilized the following arbitrary parameter values:

Using this model with these parameters, we were able to produce the following figure. This is shown below along with the resulting plot.

Figure 4: Dynamic Model implemented in MATLAB (a) and resulting plot (b)

This dynamic model allows us to see how theoretical changes in parameter sizing affects the condition of our systems (ie. the toxicity of our AMP of interest). While this cannot be used to directly predict the amount of AMP that we can produce with a given strain of E. Coli and AMP, it allows us to see the overarching behavior of our system and see what relative parameter sizes cause significant change in the system’s behavior.

Molecular Modeling

Background

The second modality of our modeling efforts was molecular modeling. Here, our goal was to elucidate more of the molecular mechanisms that underpin processes surrounding all stages of encapsulin-vector AMP delivery. In particular, we aimed to answer the following questions:

How feasible is the chosen AMP sequence within its solvent?
How might the encapsulin environment affect the secondary structure of the AMP?
How does the TEV-protease bind to different AMP polymer sizes?

Why is molecular modeling useful, when compared to wet-lab or mathematical modeling?

Molecular modeling allows us to visualize, experiment with, and analyze interactions between our targets on a single-molecule scale. This is nearly unachievable with any other form of modeling, and it provides us a more detailed picture of why molecules act the way they do.

Alongside the unique data that we avail this way, molecular modeling is much more cost-efficient.[4] Many of the software suites and programs that we may use are open-source (but invaluable nevertheless), making the only real cost computational power, which was already available to us through the University of Michigan’s high-powered computing systems.

Are there any previous studies that guide our methodology?

Most antimicrobial peptide studies revolve around their interaction with lipid bilayers,[5][6][7] and there are very few in silico encapsulin studies due to computational intensivity, so there was nothing specific to the molecules we were using that provided good background for how we could go about performing analyses. Therefore, we focused on using tools that were commonly used in studying protein-protein interactions.

Methods

Three main structures were used in this part of the study: the T4GALA encapsulin, the antimicrobial peptide (HBCM2), and the TEV protease. The T4GALA encapsulin and TEV protease structures were both available in the Protein Data Bank (IDs 7MH2 and 1LVM, respectively). Although there were multiple TEV protease structures, this one was chosen since its active site triad had been described, allowing for informed models with more accurate predictions.[8] The structure of the chosen AMP sequence had not yet been determined, so its structure had to be predicted (see below).

Structure prediction: AlphaFold and I-TASSER

Initially, the I-TASSER server was used to predict the structure of the AMP,[9][10][11] but AlphaFold was later implemented as well, due to its reported ability to predict nearly accurate protein structures.[12] Structures were made from the monomers composed of the AMP sequence appended to a TEV cleavage site, all attached to a terminal linker. Monomers were predicted using I-TASSER and AlphaFold, and polymers (under 7 monomers) were predicted solely using AlphaFold.

Molecular dynamics: GROMACS

Molecular dynamics simulations were used to determine the stability of the AMP, both isolated within solvent and when in close contact with the encapsulin wall.

All simulations were performed using GROMACS 2020.2, using the AMBER99SB-ILDN forcefield.[13] Box size was variable depending on the goal of each simulation, but generally either a 12Å or 30Å box was used. All systems were solvated with TIP3P water and balanced with sodium and chloride ions. The Verlet cutoff wscheme was used, and the short-range cutoffs for electrostatic and Van der Walls interactions were both set to 1Å. The step size was set to 2fs, and the chosen integrator for all simulations was the leap-frog integrator.

Following solvation, a general procedure was used – minimization, equilibration, and conventional molecular dynamics (cMD). Minimization was done using the steepest descent algorithm, and extra, constrained steps were sometimes added when systems featured multiple proteins within them. Equilibration was composed of two steps – NVT (mole, volume, and temperature-constrained) and NPT (mole, pressure, and temperature-constrained). Each step was run for 500ps, and temperature was set to 300K. The modified Berendsen thermostat and the Parrinello-Rahman barostat were used to condition the system. cMD had no special constraints.

Docking: HADDOCK and Prodigy

The HADDOCK web server was used for docking simulations between the AMP monomer and polymers and the TEV protease.[14][15] Chains of the structure to be featured in the interaction were chosen using experimental data. The TEV cleavage site (ENLYFQG motif) was used within the AMP,[19] and the active site of the TEV protease was used. The active site used was defined by parameters set by Shafee 2013: catalytic triad of the TEV protease are formed by residues His46, Asp81, and Cys151, and the active site is composed of all residues within 4Å from the catalytic triad.[8] HADDOCK outputted ten docking clusters, each with four conformations, for a total of forty structures from each AMP-TEV input.

Each of these structures was submitted into the Prodigy server to calculate binding affinities for later analysis.

Analysis

Molecular dynamics simulations were mainly analyzed using Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF), which are common forms of quantifying structural change, and therefore stability.[17][18] Binding affinities were classified using ANOVA to determine statistical significance.

Results

Phase I: Stability of AMP

The first step to understanding the stability of the HBMC2 antimicrobial peptide was to run it in a 10ns conventional molecular dynamics simulation. On the most superficial level, when considering RMSD and RMSF, it seemed that the peptide was mostly stable in its conformation. The RMSD and RMSF belows always remained within 2nm throughout the course of the simulation, although there were periods of time where major structural events seemed to occur, as well as amino acids with slightly higher levels of fluctuation than others (see Figure 5). Without a comparison, however, this initial data is not very significant. However, it was used in the next step: monomer analysis.

Figure 5: Graphs of the root mean square fluctuation (RMSF) and root mean square deviation (RMSD) of the antimicrobial peptide. Given the low overall values (under 2nm), it seems that this peptide is relatively stable in solvent. The RMSD graph is interpolated by a factor of 10 to reduce noise.

The RMSF graph suggested that the areas of the protein with the highest comparative fluctuation were the N-terminus and C-terminus, which AlphaFold determined to be composed of mostly ambiguous secondary structure (neither alpha-helix nor beta-sheet). Since this area takes up more than half of the peptide, it is noteworthy, since peptide termini are generally the most flexible region of any form of amino acid chain.

Visualizations of Trial 1 (the simulation with the highest RMSD and RMSF) confirmed this observation, but they also demonstrated an unexpected flexibility within the alpha-helix chain that makes up the rest of the peptide. Hydrophobic interactions between the C-terminus and residues Thr20 and Thr21 lead to a bending of the alpha-helix chain. Trials 2 and 3 suggest the same (see Figures 2 and 3).

Although previous studies have suggested that the HBCM2 peptide binds well with E. coli O111:B4 lipopolysaccharide,[19] this result suggests a possible sensitivity to environmental conditions. Useful further studies might be to see (1) if such structural change is reversible, (2) how the peptide reacts to the encapsulin environment, and (3) which conformations of the antimicrobial peptide can stably bind to the LPS and inhibit E. coligrowth.

Figure 6: (left): normal state of the AMP. Note the alpha-helix chain on the left and the ambiguous secondary structure on the right. The distance between the tail end and the marked alpha-helix residues - Thr20 and Thr21 - prevents any change in the hydrogen bonding pattern for these residues. (right) bent state of the AMP. The hydrophobic regions of all four residues pictures are in close proximity, possibly since the solvent orientation of threonine residues is somewhat unfavorable.

Figure 7: (left) AlphaFold AMP bent conformation in Trial 2. (right) bent conformation in Trial 3.

Phase II: AMP-encapsulin interface

Next, the AMP’s structural stability was tested in the vicinity of an encapsulin hexamer. The AMP was like a hairpin in shape, and therefore two conformations were chosen to be tested: one with the termini facing outward (henceforth, upright), and the other with the termini facing inward (henceforth, inverted). Simulations on both systems were run for a total of 2ns, and they were analyzed using RMSD and RMSF. Both demonstrated decreased RMSD and RMSF (under 1nm) when compared to the system with a single AMP in solvent, although the lack of trials and the reduced number of frames in this simulation may have had a role to play in this (see Figure 8).

Figure 8: RMSD and RMSF graphs of the upright and inverted systems. Overall, systems where the AMP is in close proximity to the encapsulin wall seem to have reduced structural change over time. Note the high RMSF between residues 38 and 48.

Visualization also showed the same bending that was demonstrated in the previous section (see Figure 8). The RMSF graph suggests high fluctuation between residues 38 and 48. These amino acids are solvent-oriented throughout the duration of the simulation, suggesting further that residues closer to the encapsulin tend to have lower structural change (see Figure 10). The only other stretch of residues that is as solvent-oriented is the chain stretching from the N-terminus to the beginning of the alpha-helix chain. This, however, is presumably stabilized due to its interaction with the structurally ambiguous sections near the C-terminus and close proximity to the stable alpha-helix section.

Figure 9: Visualization of the upright encapsulin-AMP complex system. Bending at the same residue as within the single-AMP system (Thr20-Thr21) are visible here, too.

Figure 10: Residues 38-48 (pictures above in blue). This chain was shown to have the highest fluctuations within the simulation, and they comprise the most solvent-oriented section of the peptide aside from the N-terminus stretching from the alpha-helix chain.

Phase III: Docking of AMP with TEV-protease

Binding affinity values of each conformation between the AMP monomer and polymers and the TEV-protease were obtained using Prodigy and analyzed using an ANOVA. See Table 5 below for a summary of values:

Table 5: Binding Affinities (kcal/mol) for Clusters of AMP Polymers with TEV Protease

For each cluster of AMPs, four different conformations were given, each with its own affinity. These values have been averaged and displayed as a number representing the cluster as a whole. Ten clusters were generated for each system, but the cluster number outputted by HADDOCK is uncorrelated with binding affinity and nevertheless doesn't define a common characteristic between all systems. For that reason, availed affinity values are sorted from lowest energy to highest within each column.

Statistical analyses demonstrated that polymers made up of five and seven subunits have significantly better binding affinities (lower energy) than the others. A p-value under 0.0001 was shown to exist between the monomer and the heptamer, the trimer and the heptamer, the tetramer and the heptamer, and the trimer and the pentamer (see Figure 11). A possible future direction would be to examine the molecular mechanisms behind this, as well as to increase polymer size, in order to determine the optimal polymer size. However, this initial dataset suggests a trend in which increasing AMP polymer size corresponds to increased binding affinity. This suggests that at least from the standpoint of TEV protease efficacy, increasing AMP polymer size should be a feasible way to increase overall yield.

Figure 11: Box-and-whisker plots of binding affinity of AMP polymers of different lengths with TEV protease. Statistical significance is demonstrated between the binding affinities of all other polymer lengths and the pentamers and heptamer. However, p-value < 0.0001 is only shown in the following four comparisons: 1-7, 3-7, 4-7, 3-5.

References

Wiryaman, T., & Toor, N. (2022). Recent advances in the structural biology of encapsulin bacterial nanocompartments. Journal of structural biology: X, 6, 100062. https://doi.org/10.1016/j.yjsbx.2022.100062
Koch, A.L. (1998). The Monod Model and Its Alternatives. In: Koch, A.L., Robinson, J.A., Milliken, G.A. (eds) Mathematical Modeling in Microbial Ecology. Chapman & Hall Microbiology Series. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-4078-6_4
Demko, M., Chrást, L., Dvořák, P., Damborský, J., & Šafránek, D. (2019). Computational Modelling of Metabolic Burden and Substrate Toxicity in Escherichia coli Carrying a Synthetic Metabolic Pathway. Microorganisms, 7(11), 553. https://doi.org/10.3390/microorganisms7110553
Aminpour, Maral et al. “An Overview of Molecular Modeling for Drug Discovery with Specific Illustrative Examples of Applications.” Molecules (Basel, Switzerland) vol. 24,9 1693. 30 Apr. 2019, doi:10.3390/molecules24091693
Talandashti, Reza et al. “Molecular Insights into Pore Formation Mechanism, Membrane Perturbation, and Water Permeation by the Antimicrobial Peptide Pleurocidin: A Combined All-Atom and Coarse-Grained Molecular Dynamics Simulation Study.” The journal of physical chemistry. B vol. 125,26 (2021): 7163-7176. doi:10.1021/acs.jpcb.1c01954
Fiorentino, Francesco et al. “Dynamics of an LPS translocon induced by substrate and an antimicrobial peptide.” Nature chemical biology vol. 17,2 (2021): 187-195. doi:10.1038/s41589-020-00694-2
Aghazadeh, Hossein et al. “Interactions of GF-17 derived from LL-37 antimicrobial peptide with bacterial membranes: a molecular dynamics simulation study.” Journal of computer-aided molecular design vol. 34,12 (2020): 1261-1273. doi:10.1007/s10822-020-00348-4
Shafee, Thomas. “Evolvability of a Viral Protease: Experimental Evolution of Catalysis, Robustness and Specificity.” University of Cambridge, 2013.
W Zheng, C Zhang, Y Li, R Pearce, EW Bell, Y Zhang. Folding non-homology proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Reports Methods, 1: 100014 (2021). AMA
J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015). AMA
J Yang, Y Zhang. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Research, 43: W174-W181 (2015). AMA
Jumper, John et al. “Highly accurate protein structure prediction with AlphaFold.” Nature vol. 596,7873 (2021): 583-589. doi:10.1038/s41586-021-03819-2
Van Der Spoel, David et al. “GROMACS: fast, flexible, and free.” Journal of computational chemistry vol. 26,16 (2005): 1701-18. doi:10.1002/jcc.20291
R.V. Honorato, P.I. Koukos, B. Jimenez-Garcia, A. Tsaregorodtsev, M. Verlato, A. Giachetti, A. Rosato & A.M.J.J. Bonvin (2021). "Structural biology in the clouds: The WeNMR-EOSC Ecosystem." Frontiers Mol. Biosci., 8, fmolb.2021.729513. APA
G.C.P van Zundert, J.P.G.L.M. Rodrigues, M. Trellet, C. Schmitz, P.L. Kastritis, E. Karaca, A.S.J. Melquiond, M. van Dijk, S.J. de Vries and A.M.J.J. Bonvin (2016). "The HADDOCK2.2 webserver: User-friendly integrative modeling of biomolecular complexes." J. Mol. Biol., 428, 720-725 (2015). APA
Raran-Kurussi, Sreejith et al. “Removal of Affinity Tags with TEV Protease.” Methods in molecular biology (Clifton, N.J.) vol. 1586 (2017): 221-230. doi:10.1007/978-1-4939-6887-9_14
Fu, Bai-Wen et al. “Stability is essential for insecticidal activity of Vip3Aa toxin against Spodoptera exigua.” AMB Express vol. 12,1 92. 14 Jul. 2022, doi:10.1186/s13568-022-01430-w
Saha, Joyanta Kumar, and Md Jahir Raihan. “The binding mechanism of ivermectin and levosalbutamol with spike protein of SARS-CoV-2.” Structural chemistry vol. 32,5 (2021): 1985-1992. doi:10.1007/s11224-021-01776-0
Scott, M G et al. “Biological properties of structurally related alpha-helical cationic antimicrobial peptides.” Infection and immunity vol. 67,4 (1999): 2005-9. doi:10.1128/IAI.67.4.2005-2009.1999
Talkad, V., Schneider, E., & Kennell, D. (1976). Evidence for variable rates of ribosome movement in Escherichia coli. Journal of molecular biology, 104(1), 299–303. https://doi.org/10.1016/0022-2836(76)90015-2
J. A. Jones, A. S. Cristie-David, M. P. Andreas, T. W. Giessen, Angew. Chem. Int. Ed. 2021, 60, 25034.
Lee, T. H., Carpenter, T. S., D'haeseleer, P., Savage, D. F., & Yung, M. C. (2020). Encapsulin carrier proteins for enhanced expression of antimicrobial peptides. Biotechnology and bioengineering, 117(3), 603–613. https://doi.org/10.1002/bit.27222

Model

CONTENTS