While methods like NMR, X-ray crystallography and cryo-EM would give the most detailed insights on 3D structure and dynamics of biological macromolecules, they require expe rtise and are not straightforward to apply.
With the availability of previously determined experimental structures and the latest advances in neural network technologies such as AlphaFold [1], one can get reasonably accurate protein structure predictions at an atomic 3D level within minutes. This enables us to get snapshots of folded protein structures and focus strength to parts where we can prove and improve our experimental designs with e.g., molecular dynamics (MD) simulations that give feedback over stability and molecular distances by time up to microseconds. MD is the pinnacle in simulation but requires a lot of computing power and time to model bigger protein structures. By combining AlphaFold modelling and MD, we can obtain a data set adapted to our proteins of interest. We can then confirm and refine via simulations the nature and behaviour of it, allowing us to work faster in an iterative fashion.
Here we show that we can model the 3D structure of shell proteins derived from the bacterial microcompartment (BMC) T1 protein (PDB ID: 6n06) [2][3] with AlphaFold2. We can show the spatial fit of our pathway enzymes to the inner sites of the compartment. The pore size in these complexes, which is crucial for the mass transfer of the substrates and products, was modified by introducing insertion, replacement and deletion mutations. To verify adapted pore-loop size and dynamics we run molecular dynamics (MD) via NAMD software for the BMC T1 complex forming homotrimers.
There is currently a new field emerging in biology, aiming at the construction of nano-reactors, synthetic organelles and promising scaffolds for drug delivery [4][5]. Despite being popular, to our knowledge, this is the first modelling data generated on pore dynamics of BMC proteins, as pore dynamic simulations have previously only been generated for encapsulin. For our project, in which we use compartmentalisation as protective protein shells to enclose pathway enzymes and their intermediate products, we wanted to put a special focus on the possibilities of different pore sizes. We produce the rather large molecules Indigo and Indirubin. These molecules show, in the case of Indigo, lengths of 10.9 Å up to 11.9 Å, which in its largest conformation exceeds the width of our wild-type pore, with a width of approximately 9.8 Å (see Figure 4). While being a worst-case scenario, this could be a lever to improve our production capabilities and our system as a whole.
Furthermore, we looked at amino acid sequence conservation of the shell-forming proteins of BMC and encapsulins [3][6], aiming to find less conserved sites for the introduction of amber stop codons. These stop codons are required for the insertion of non-canonical amino acids, that can be used for crosslinking and staining. For this purpose, we used the ConSurf database [7] to identify the conserved amino acid sequences of our target proteins as well as the Molecular Graphics System PyMOL for molecular modelling and visualisation.
On top of that, we performed ODE fits of growth curves and fluorescence measurements to obtain more insight into patterns our data follows, as well as the data our collaboration partner iGEM Cambridge sent us. For further information please refer to our collaboration page.
Wild-type BMC T1 proteins assemble into a homo-trimer (Figure 1B). Each T1 protein has two loop regions that are directed towards the center and form a pore. For the further use and application of BMCs, it is important to be able to adapt the pore size to one's own requirements in order to obtain an optimal result.
The first loop region consists of the amino acids Ser, Ser, Gly and the second of Ala, Ile, Gly, Ile, Ala, Gly, Lys. Consequently, these loop regions have to be changed in order to influence the pore size, either by deleting amino acids or by adding additional ones with the same properties. In order to reduce the pore size (Figure 1A, WT), amino acids were added. The first loop now consists of the amino acids Ser, Ser, Ser, Gly, Gly and the second loop of Ala, Ile, Gly, Ile, Gly, Ile, Ala, Gly, Lys. To enlarge the pore, amino acids were deleted starting from the wild-type sequence. Thus, the first loop (Figure 1C, Mut2) consists only of Gly and the second (Figure 1A, Mut1) of the amino acids Ala, Ile, Gly, Lys.
Figure 1: Different pore sizes of the WT and mutant BMC, consisting of T1 homo-trimer proteins. A: Structure with reduced pore mutation (Mut1), amino acids were added in the loop region, compared to the wild-type. B: Wild-type (WT). C: Structure with enlarged pore mutation (Mut2), amino acids were deleted in the loop region. Protein structure generated in AlphaFold2 [3] and Figure created with PyMOL.
Figure 3: Root mean squared fluctuations (RMSF, Å) analysis of modified BMC T1 trimer-complex (BBa_K4229030) for 30 ns molecular dynamic simulation in NAMD. Homo-trimer one goes from 1-220, homo-trimer two from 221-440 and homo-trimer three from 441-660 residue index. Wild-type is BMC T1 (PDB: 6n06 [3]). His tag is linked C-terminal via GGSGG. Mutation 1 has deletion mutation in loops that form the pore in the middle of the complex. Mutation 2 has insertion mutation at the same sites. Residue indices forming pore loops are framed with blue boxes. Spikes around residue index 86 (1) and index 110 (2) resemble long peptides situated outside the pore region between α-helices and β-sheets.
In Figure 4 we measured twice the atom distance between two opposing pore loops for all our variants. Average distances (in red) are for WT = 7,72 Å, Mut1 = 5.52 Å and Mut2 = 15.49 Å.
Figure 4: Atom distance analysis of opposing pore loops in BMC T1 (BBa_K4229030) wild-type (WT) and pore loop mutants (Mut1, Mut2). A: WT pore and selected pore distances used for the analysis of pore dynamics in WT (B), Mut1 (C) and Mut2 (D) for 30 ns MD simulation. Average distance (in red) for WT = 7,72 Å, Mut1 = 5.52 Å and Mut2 = 15.49 Å.
To get a better impression of the spatial structure, the T1 proteins, the H proteins, SnoopTag/Catcher, and the enzymes XiaI, TnaA were created using AlphaFold2 and aligned in PyMOL, using reference structures. Please see on our contribution page for details of how this was done. Since AlphaFold2 only provides rigid structures and the Spy/Snoop tags are only disordered chains, the SpyTags did not point in the right direction, so that the enzymes, after aligning, protruded into the shell. In order to still obtain a meaningful model, the exact distances of the spy tag/catcher construct were measured with the help of PyMOL and the enzyme XiaI (Figure 5, orange) was spatially shifted.
Figure 5: Model generated by AlphaFold2 of a section of the HO shell containing the Xia1 and TnaA enzymes of the indigo indirubin pathway bound by the Spy/SnoopTag - Spy/SnoopCatcher system.A: Top view of the HO shell, the hexamers formed by the H protein are purple and the hexamer from the T1 protein is yellow. B: Bottom view of the HO shell, showing two enzymes of the indigo/indirubin pathway (Xia1 in orange and TnaA in purple). C: Side view of the model. Protein structure generated in AlphaFold2 [8] and Figure created with PyMOL.
Figure 6: BMC T1 homo-tetramer complex coloured by conservation state.A: The redder the amino acid, the more conserved it is. The bluer the amino acid, the less conserved it is. Non-conserved amino acids were selected as sites for Amber stop codon mutation. Protein (PDB: 6n06 [3]) analysed with the ConSurf database [7] and Figure created with PyMOL.
Amino acid (AA) sequences used for prediction were modified BMC T1 from BBa_K4229030 containing His tag C-terminal. For enzyme complex assembly we used BBa_K4229037, which contain added Snoop and Spy target sequences added at position AA96 and AA128. For Encapsulin we used BBa_K4229020, which has also a C-terminal His tag.
All AlphaFold predictions were done using the Google Collaboratory notebook Collabfold running AlphaFold2 [8]. Settings used were use_amber off, template_mode off, msa_mode MMseqs2 (UniRef+Environmental), pair_mode unpaired+paired, model_type auto, num_recycles 3 and DPI 200.
Alignment and verification of the results as well as protein model figures were created using the PyMOL Molecular Graphics System, Version 2.5.3, Schrödinger, LLC. For reference alignments, the PDB ID 6n06 [3] and PDB ID 4mli [9] for BMC T1 structure were used. Indigo molecular length was measured in the software Chemdraw from Perkin Elmer. For in-depth information on how we used AlphaFold2 and PyMOL please see our Tutorial located in the contribution section. With our tutorial, we hope to ease for future iGEM teams the start into the modelling of data and carefully describe how to obtain AlphaFold modelling results and conduct follow-up alignment and visualisation with PyMOL.
BMC T1 AlphaFold prediction variants wild-type, mutant 1 and mutant 2 were used for MD simulation in NAMD [10]. Preparation of our files was done in the Visual Molecular Dynamics suite VMD [11] by using the built-in automatic PSF builder with the default Charm topology files followed up by the solvation box builder with the option to use molecules dimensions and minimal distance of 2. Ionisation was done with the Autoionise plugin and NaCl using the default 0.15 mol/L. For our parameter files we used the par_all36m_prot [12] and a modified toppar_water_ions.mod [13][14]. Each system minimised with 500 steps and heated to 310 K. Constant pressure (1 atm) and temperature were maintained using Langevin piston dynamics with a damping coefficient of 1 ps. Periodic boundary volume were calculated (see supplementary data) and all molecules wrapped to central cell. All heated system were then simulated at constant box volume for 30 ns. Analysis of the simulations was done in VMD. We first visualised and then exported a video with the plugin Video Maker using tachyon rendering and VideoMach for concatenation (as seen in Figure 2). We then calculated the root mean square fluctuation (RMSF) values (Figure 3, see supplementary on Zenodo CLoud, reachable in our labbook) and measured distance via the VMD built-in measure tool. DCD Files were loaded using a stride of 200. As described in our Integrated Human Practice site, we engaged in a fight against fake science. In our initiative to push reproducibility and raw data availability all the topology, parameters, PDB, PSF and also some dcd trajectory files were uploaded to the Zenodo Cloud Dataverse (see in our labbook to get there).
ConSurf database [7] was used to locate highly conserved regions on BMC T1 + H and EncA. With visual inspection of the sequence in PyMOL we were able to locate promising non-conserved AAs that would be suitable for amber stop codon mutation via site-directed mutagenesis. We used this method for sites with which we wanted to crosslink. For more information see our ncAA results page. ConSurf results were inspected in PyMOL.