Modelling

  • Structure prediction of wildtype and mutant BMC complexes via AlphaFold
  • AlphaFold based structure prediction of BMCs forming complexes with enzymes via the SpyTag/Catcher and SnoopTag/Catcher system
  • First molecular dynamics simulation of bacterial microcompartment pore BMC-T1 in native and mutated state
  • ODE modelling and curve-fitting of OD measurements for bacterial growth with MATLAB as a collaboration with Cambridge

Introduction

While methods like NMR, X-ray crystallography and cryo-EM would give the most detailed insights on 3D structure and dynamics of biological macromolecules, they require expe rtise and are not straightforward to apply.
With the availability of previously determined experimental structures and the latest advances in neural network technologies such as AlphaFold [1], one can get reasonably accurate protein structure predictions at an atomic 3D level within minutes. This enables us to get snapshots of folded protein structures and focus strength to parts where we can prove and improve our experimental designs with e.g., molecular dynamics (MD) simulations that give feedback over stability and molecular distances by time up to microseconds. MD is the pinnacle in simulation but requires a lot of computing power and time to model bigger protein structures. By combining AlphaFold modelling and MD, we can obtain a data set adapted to our proteins of interest. We can then confirm and refine via simulations the nature and behaviour of it, allowing us to work faster in an iterative fashion.
Here we show that we can model the 3D structure of shell proteins derived from the bacterial microcompartment (BMC) T1 protein (PDB ID: 6n06) [2][3] with AlphaFold2. We can show the spatial fit of our pathway enzymes to the inner sites of the compartment. The pore size in these complexes, which is crucial for the mass transfer of the substrates and products, was modified by introducing insertion, replacement and deletion mutations. To verify adapted pore-loop size and dynamics we run molecular dynamics (MD) via NAMD software for the BMC T1 complex forming homotrimers.
There is currently a new field emerging in biology, aiming at the construction of nano-reactors, synthetic organelles and promising scaffolds for drug delivery [4][5]. Despite being popular, to our knowledge, this is the first modelling data generated on pore dynamics of BMC proteins, as pore dynamic simulations have previously only been generated for encapsulin. For our project, in which we use compartmentalisation as protective protein shells to enclose pathway enzymes and their intermediate products, we wanted to put a special focus on the possibilities of different pore sizes. We produce the rather large molecules Indigo and Indirubin. These molecules show, in the case of Indigo, lengths of 10.9 Å up to 11.9 Å, which in its largest conformation exceeds the width of our wild-type pore, with a width of approximately 9.8 Å (see Figure 4). While being a worst-case scenario, this could be a lever to improve our production capabilities and our system as a whole.
Furthermore, we looked at amino acid sequence conservation of the shell-forming proteins of BMC and encapsulins [3][6], aiming to find less conserved sites for the introduction of amber stop codons. These stop codons are required for the insertion of non-canonical amino acids, that can be used for crosslinking and staining. For this purpose, we used the ConSurf database [7] to identify the conserved amino acid sequences of our target proteins as well as the Molecular Graphics System PyMOL for molecular modelling and visualisation.
On top of that, we performed ODE fits of growth curves and fluorescence measurements to obtain more insight into patterns our data follows, as well as the data our collaboration partner iGEM Cambridge sent us. For further information please refer to our collaboration page.

Results

Pore size modelling

Wild-type BMC T1 proteins assemble into a homo-trimer (Figure 1B). Each T1 protein has two loop regions that are directed towards the center and form a pore. For the further use and application of BMCs, it is important to be able to adapt the pore size to one's own requirements in order to obtain an optimal result.
The first loop region consists of the amino acids Ser, Ser, Gly and the second of Ala, Ile, Gly, Ile, Ala, Gly, Lys. Consequently, these loop regions have to be changed in order to influence the pore size, either by deleting amino acids or by adding additional ones with the same properties. In order to reduce the pore size (Figure 1A, WT), amino acids were added. The first loop now consists of the amino acids Ser, Ser, Ser, Gly, Gly and the second loop of Ala, Ile, Gly, Ile, Gly, Ile, Ala, Gly, Lys. To enlarge the pore, amino acids were deleted starting from the wild-type sequence. Thus, the first loop (Figure 1C, Mut2) consists only of Gly and the second (Figure 1A, Mut1) of the amino acids Ala, Ile, Gly, Lys.

Different pore sizes of the WT and mutant BMC, consisting of T1 homo-trimer proteins.

Figure 1: Different pore sizes of the WT and mutant BMC, consisting of T1 homo-trimer proteins. A: Structure with reduced pore mutation (Mut1), amino acids were added in the loop region, compared to the wild-type. B: Wild-type (WT). C: Structure with enlarged pore mutation (Mut2), amino acids were deleted in the loop region. Protein structure generated in AlphaFold2 [3] and Figure created with PyMOL.


As AlphaFold [1] output is a single structure snapshot, it was particularly important to prove the stability of the pores, especially stability in the mutations, via simulations. In Figure 2 we show molecular dynamics simulation in NAMD for 30 ns for Mut1, WT and Mut2.

Figure 3 shows root mean squared fluctuations (RMSF) analysis for each residue index of the homo-trimer complexes. The mutations sites are framed by boxes (blue). Here the mutation 2 shows the highest fluctuations in the pore loops. End and beginning of proteins are highly fluctuating, because of flexible peptides chains and a C-terminal His tag. It can be observed that residues around index 86 (Figure 3, 1) and index 110 (Figure 3, 2) resemble long peptides situated outside the pore region between more stable α-helices and β-sheets. These peptide sites facing the cytoplasm show an increased fluctuation.

Figure 3: Root mean squared fluctuations (RMSF, Å) analysis of modified BMC T1 trimer-complex (BBa_K4229030) for 30 ns molecular dynamic simulation in NAMD.

Figure 3: Root mean squared fluctuations (RMSF, Å) analysis of modified BMC T1 trimer-complex (BBa_K4229030) for 30 ns molecular dynamic simulation in NAMD. Homo-trimer one goes from 1-220, homo-trimer two from 221-440 and homo-trimer three from 441-660 residue index. Wild-type is BMC T1 (PDB: 6n06 [3]). His tag is linked C-terminal via GGSGG. Mutation 1 has deletion mutation in loops that form the pore in the middle of the complex. Mutation 2 has insertion mutation at the same sites. Residue indices forming pore loops are framed with blue boxes. Spikes around residue index 86 (1) and index 110 (2) resemble long peptides situated outside the pore region between α-helices and β-sheets.

In Figure 4 we measured twice the atom distance between two opposing pore loops for all our variants. Average distances (in red) are for WT = 7,72 Å, Mut1 = 5.52 Å and Mut2 = 15.49 Å.

Figure 4: Atom distance analysis of opposing pore loops in BMC T1 (BBa_K4229030) wild-type (WT) and pore loop mutants (Mut1, Mut2).

Figure 4: Atom distance analysis of opposing pore loops in BMC T1 (BBa_K4229030) wild-type (WT) and pore loop mutants (Mut1, Mut2). A: WT pore and selected pore distances used for the analysis of pore dynamics in WT (B), Mut1 (C) and Mut2 (D) for 30 ns MD simulation. Average distance (in red) for WT = 7,72 Å, Mut1 = 5.52 Å and Mut2 = 15.49 Å.

BMC + enzyme complex

To get a better impression of the spatial structure, the T1 proteins, the H proteins, SnoopTag/Catcher, and the enzymes XiaI, TnaA were created using AlphaFold2 and aligned in PyMOL, using reference structures. Please see on our contribution page for details of how this was done. Since AlphaFold2 only provides rigid structures and the Spy/Snoop tags are only disordered chains, the SpyTags did not point in the right direction, so that the enzymes, after aligning, protruded into the shell. In order to still obtain a meaningful model, the exact distances of the spy tag/catcher construct were measured with the help of PyMOL and the enzyme XiaI (Figure 5, orange) was spatially shifted.

Figure 5: Model generated by AlphaFold2 of a section of the HO shell containing the Xia1 and TnaA enzymes of the indigo indirubin pathway bound by the Spy/SnoopTag - Spy/SnoopCatcher system.

Figure 5: Model generated by AlphaFold2 of a section of the HO shell containing the Xia1 and TnaA enzymes of the indigo indirubin pathway bound by the Spy/SnoopTag - Spy/SnoopCatcher system.A: Top view of the HO shell, the hexamers formed by the H protein are purple and the hexamer from the T1 protein is yellow. B: Bottom view of the HO shell, showing two enzymes of the indigo/indirubin pathway (Xia1 in orange and TnaA in purple). C: Side view of the model. Protein structure generated in AlphaFold2 [8] and Figure created with PyMOL.

Figure 6: BMC T1 homo-tetramer complex coloured by conservation state.

Figure 6: BMC T1 homo-tetramer complex coloured by conservation state.A: The redder the amino acid, the more conserved it is. The bluer the amino acid, the less conserved it is. Non-conserved amino acids were selected as sites for Amber stop codon mutation. Protein (PDB: 6n06 [3]) analysed with the ConSurf database [7] and Figure created with PyMOL.

Conclusion

We showed that AlphaFold structure prediction of modified BMC proteins hold up to their native counterpart and thus can be used for experimental mutation design.
Our results generated in AlphaFold suggests also, that the Snoop and Spy tags indeed have the spatial freedom to bind enzymes linked with catcher constructs. Furthermore, as shown in Figure 5 it is possible to catch multiple enzymes with the same BMC trimer-complex and reach five to six enzymes attachment to BMC T1 by optimizing flexible hinging of the tags. We showed that it is possible to engineer the pore size by implementing various mutations via AlphaFold (Figure 1).
We achieved 30 ns stable MD simulations, in which we were able to maintain and track artificial pores, bigger and smaller compared to the WT protein. The video in Figure 2 and RMSF data in Figure 3 further supports that we have no disruption of a stable complex over the entire simulation. The highest fluctuation in the pore loops was observed in mutation 2 with an enlarged pore size. If compared to the video in Figure 2 it is visible that Mutation 2 undergoes more fluctuations and pore loop flips. Figure 4 brings certainty by measuring atom distances in the pore loop, showing high equality of distance in WT (average distance = 7,72 Å) and Mut1 (average distance = 15.49 Å) and some change in the beginning of our simulation in Mut2 (average distance = 5.52 Å) with more equal readings after 10 ns of simulation.

Outlook

As shown above we can see that the pore size and its dynamics varies among constructs by time. Under production conditions, this would probably allow us to improve and specify substrate product exchange into and out of the fully assembled BMC. One could thus adapt the BMC shell to one's wishes and requirements and furthermore better protect the enzymes and reactants, for example. In adaptation to our project and its proposed implementation the next step is to test our simulated pore sizes in the wet lab. If our data can be confirmed, the BMCs can become a very powerful tool in manufacturing.
Multiple and longer simulation replicates of the full protein complex is required to get more accurate information on the dynamics of the system. However, this requires an access to extensive computing power we did not get a grasp on yet.
If microcompartments can gain a foothold in manufacturing and bioproduction, we may have already reached an important milestone for their applicability.

Methods

Amino acid (AA) sequences used for prediction were modified BMC T1 from BBa_K4229030 containing His tag C-terminal. For enzyme complex assembly we used BBa_K4229037, which contain added Snoop and Spy target sequences added at position AA96 and AA128. For Encapsulin we used BBa_K4229020, which has also a C-terminal His tag.
All AlphaFold predictions were done using the Google Collaboratory notebook Collabfold running AlphaFold2 [8]. Settings used were use_amber off, template_mode off, msa_mode MMseqs2 (UniRef+Environmental), pair_mode unpaired+paired, model_type auto, num_recycles 3 and DPI 200.
Alignment and verification of the results as well as protein model figures were created using the PyMOL Molecular Graphics System, Version 2.5.3, Schrödinger, LLC. For reference alignments, the PDB ID 6n06 [3] and PDB ID 4mli [9] for BMC T1 structure were used. Indigo molecular length was measured in the software Chemdraw from Perkin Elmer. For in-depth information on how we used AlphaFold2 and PyMOL please see our Tutorial located in the contribution section. With our tutorial, we hope to ease for future iGEM teams the start into the modelling of data and carefully describe how to obtain AlphaFold modelling results and conduct follow-up alignment and visualisation with PyMOL.
BMC T1 AlphaFold prediction variants wild-type, mutant 1 and mutant 2 were used for MD simulation in NAMD [10]. Preparation of our files was done in the Visual Molecular Dynamics suite VMD [11] by using the built-in automatic PSF builder with the default Charm topology files followed up by the solvation box builder with the option to use molecules dimensions and minimal distance of 2. Ionisation was done with the Autoionise plugin and NaCl using the default 0.15 mol/L. For our parameter files we used the par_all36m_prot [12] and a modified toppar_water_ions.mod [13][14]. Each system minimised with 500 steps and heated to 310 K. Constant pressure (1 atm) and temperature were maintained using Langevin piston dynamics with a damping coefficient of 1 ps. Periodic boundary volume were calculated (see supplementary data) and all molecules wrapped to central cell. All heated system were then simulated at constant box volume for 30 ns. Analysis of the simulations was done in VMD. We first visualised and then exported a video with the plugin Video Maker using tachyon rendering and VideoMach for concatenation (as seen in Figure 2). We then calculated the root mean square fluctuation (RMSF) values (Figure 3, see supplementary on Zenodo CLoud, reachable in our labbook) and measured distance via the VMD built-in measure tool. DCD Files were loaded using a stride of 200. As described in our Integrated Human Practice site, we engaged in a fight against fake science. In our initiative to push reproducibility and raw data availability all the topology, parameters, PDB, PSF and also some dcd trajectory files were uploaded to the Zenodo Cloud Dataverse (see in our labbook to get there).
ConSurf database [7] was used to locate highly conserved regions on BMC T1 + H and EncA. With visual inspection of the sequence in PyMOL we were able to locate promising non-conserved AAs that would be suitable for amber stop codon mutation via site-directed mutagenesis. We used this method for sites with which we wanted to crosslink. For more information see our ncAA results page. ConSurf results were inspected in PyMOL.

References


[1] J. Jumper et al., “Highly accurate protein structure prediction with AlphaFold,” Nature, vol. 596, no. 7873, pp. 583–589, 2021, doi: 10.1038/s41586-021-03819-2.
[2] H. M. Berman et al., “The Protein Data Bank,” Nucleic Acids Res., vol. 28, no. 1, pp. 235–242, Jan. 2000, doi: 10.1093/nar/28.1.235.
[3] PDB 6n06: Greber, B.J., Sutter, M., Kerfeld, C.A., “Cryo-EM structure of the HO BMC shell: BMC-T1 in the assembled shell”, 2019. Doi: 10.2210/pdb6n06/pdb
[4] L. S. R. Adamson et al., “Pore structure controls stability and molecular flux in engineered protein cages,” Sci. Adv., vol. 8, no. 5, pp. 1–13, 2022, doi: 10.1126/sciadv.abl7346.
[5] E. M. Williams, S. M. Jung, J. L. Coffman, and S. Lutz, “Pore Engineering for Enhanced Mass Transport in Encapsulin Nanocompartments,” ACS Synth. Biol., vol. 7, no. 11, pp. 2514–2517, 2018, doi: 10.1021/acssynbio.8b00295.
[6] PDB 4pt2: Fontana, J., Aksyuk, A.A., Steven, A.C., Hoiczyk, E., “Myxococcus xanthus encapsulin protein (EncA)”, 2014, doi: 10.2210/pdb4pt2/pdb
[7] H. Ashkenazy et al., “ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules,” Nucleic Acids Res., vol. 44, no. W1, pp. W344–W350, 2016, doi: 10.1093/NAR/GKW408.
[8] M. Mirdita, K. Schütze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger, “ColabFold: making protein folding accessible to all,” Nat. Methods, vol. 19, no. 6, pp. 679–682, 2022, doi: 10.1038/s41592-022-01488-1.
[9] PDB 4mli: Li L, Fierer JO, Rapoport TA, Howarth M Structural Analysis and Optimization of the Covalent Association between SpyCatcher and a Peptide Tag.”, J. Mol. Biol., 2013, doi: 10.2210/pdb4mli/pdb
[10] J. C. Phillips et al., “Scalable molecular dynamics on CPU and GPU architectures with NAMD,” J. Chem. Phys., vol. 153, no. 4, 2020, doi: 10.1063/5.0014475.
[11] Humphrey, W., Dalke, A. and Schulten, K., "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-38. http://www.ks.uiuc.edu/Research/vmd/
[12] J. Huang et al., “CHARMM36m: An improved force field for folded and intrinsically disordered proteins,” Nat. Methods, vol. 14, no. 1, pp. 71–73, 2016, doi: 10.1038/nmeth.4067.
[13] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, “Comparison of simple potential functions for simulating liquid water,” J. Chem. Phys., vol. 79, no. 2, pp. 926–935, 1983, doi: 10.1063/1.445869.
[14] D. Beglov and B. Roux, “Finite representation of an infinite bulk system: Solvent boundary potential for computer simulations,” J. Chem. Phys., vol. 100, no. 12, pp. 9050–9063, 1994, doi: 10.1063/1.466711.