Modelling


Introduction

A prominent part of our project was bioinformatic characterization of the proteins we used involved in heavy metal biosensing and bioremediation. Our in depth partnership with Team Munich allowed us to generate an automated pipeline for Transcription Factor (TF) docking, mutation and analysis. With many of the skills and techniques gained from this partnership, we were also able to independently dock various heavy metals onto the different metallothioneins (MTs) we were using as part of bioremediation and directed evolution. The goal of this was to generate novel structures of the MTs, and predict the heavy metal binding stoichiometry and affinity, so that we could use the optimal MT in our hydrogels. We also structurally and kinetically characterized the transcription factors we used, producing structures of the heavy metal docked form, and values for the free energy of binding.


Methods

Metallothionein structure generation

In bioremediation and directed evolution, we were mainly looking at six different MTs, from the species Mytilus edulis, Mytilus galloprovincialis, Callinectes sapidus, Danio rerio, Pseudomonas fluorescens and Saccharomyces cerevisiae. Structures for the C. sapidus MT in complex with Cd2+ [12], and the S. cerevisiae MT in and out of complex with Cd2+ exist on the PDB [13], however not for any of the other species. Given the strong conformational changes which occur with ligand binding, we decided to use AlphaFold to predict the structures of all our MTs. We generated 10 models for each MT, and selected the model which scored the highest in the Local Distance Difference Test (lDDT) to use for our docking simulations.

Transcription Factor structure generation

For biosensing, we were chiefly interested in four different TFs from Cupravidus metallidurans; ArsR, PbrR, MerR and a mutated form of MerR (BBa_K1724002, henceforth referred to as mut_merR), which mainly bind Arsenic, Lead, Mercury and Cadmium ions respectively. ArsR [14] and MerR [3] have been structurally characterized, but from different species. PbrR has an existing AlphaFold predicted PDB structure, and mut_merR has never been structurally characterized. We generated de novo structures for each of these using AlphaFold as well and chose to perform our docking simulations with both the existing TF structures not from C. metallidurans, and the de novo predicted AlphaFold structures.

PDB structures were already dimerized, as our TFs are found as dimers in vivo [5, 6, 20]. However, normal AlphaFold produces monomeric structures [7], so we needed to dimerize and energy minimize these structures. We chose to do this with the HSYMDOCK Server, choosing C2 symmetry for each TF [1,9,10,15,17,19,21]. Energy minimization was performed with the YASARA energy minimization server [8]. The resulting structures were extracted using YASARA view and used for the rest of the docking simulations.

Heavy metal docking simulations

We used the same approach to dock heavy metals to both our MTs and TFs. Heavy metal ions are coordinated by cysteines, so we decided to dock on every cysteine in each protein and used the free energy of binding to see which cysteines could be successfully docked by each heavy metal ion. Unlike normal docking simulations which use mainly van der Waals interactions to determine docking, we needed to slightly modify the approach because heavy metal ions form a dative covalent bond. We prepared ligand PDBs with AutoDock’s prepareCovalent.py [2], removed hydrogens and added gasteiger charges for the heavy metals, and then used MGLTools prepare_receptor4.py to prepare the ligand PDBQT [11], in accordance with the Covalent Docking protocol. We then prepared the protein PDBQT with MGLTools prepare_receptor4.py with the -A Hydrogens flag. We then generated flexible docking files with MGLTools prepare_flexreceptor4.py, and generated Grid Parameter Files and Docking Parameter Files (GPFs and DPFs) with MGLTools prepare_gpf4.py and prepare_dpf4.py, specifying the flexible receptor files [11]. We then used AutoGrid to calculate atomic affinities, and finally AutoDock 4.2 to preform docking [11] See Figure 1 for a diagramatic workflow. The docked coordinates of the heavy metal and corresponding gamma sulfur for the best docked model were extracted from the resulting docking logs. Heavy metals which could spontaneously bind (negative free energy of binding) were added back into the original structures to generate the final docked structures. Full annotated shell scripts are available on our GitLab software page.


Figure 1. Bioinformatic workflow for covalent docking of heavy metal ions

Results

Metallothioneins

The AlphaFold structures have indicated how the metallothioneins are normally shaped, and hence the conformational changes which occur with heavy metal binding. Metallothioneins are very rich in cysteines, however the individual chemical environment around the sulfur of each cysteine changes the binding potential. This binding potential is calculated by AutoDock as a function of Torsional Energy, Ligand-Fixed and Moving Receptor energy, van der Waals interactions, Electrostatic energy and unbound systems energy. We corrected this overall value by subtracting 0.60 kcal/mol, as this was a constant value added on by MGLTools prepare_flexreceptor4.py not being able to take heavy metal atom types. We generated a score for the free energy of binding per ion for each MT, a measure for the overall affinity of the MT for heavy metals. We chose to perform this docking simulation with Ag+, as it strongly binds metallothioneins [16], and was useful for interpreting the results of our directed evolution experiments and controls, which also used Ag+ (See Figure 2).

Figure 2: The structures of our metallothioneins in the unbound state (left) and Ag+ bound state (right). a) M. edulis b) M. galloprovincialis c) C. sapidus d) D. rerio e) P. fluorescens f) S. cerevisiae. Ag+ is represented by grey spheres.

Species origin of MT Total cysteines Number of Ag+ ions docked Total free energy of binding (kcal/mol) Free energy per binding ion (kcal/mol)
M. edulis 20 4 -0.83 -0.21
M. galloprovincialis 21 5 -0.85 -0.17
D. rerio 20 4 -0.58 -0.15
C. sapidus 18 5 -0.65 -0.13
P. fluorescens 9 6 -2.44 -0.41
S. cerevisiae 12 5 -1.87 -0.37

Table 1: AutoDock generated and corrected free energies of binding.

Interestingly, P. fluorescens has the highest number of Ag+ ions docked, highest total free energy of binding and highest free energy of binding per ion, regardless that it has the least number of total cysteines. P. fluorescens is also the only prokaryotic MT that was docked, further investigation into prokaryotic MTs might yield interesting results about their properties. Studies have shown already that Pseudomonas MTs have interesting properties, like preferential binding to cadmium over zinc [4]. The number of Ag+ ions docked is also consistent with literature values [12,13], which indicates accuracy of the proposed model. The total number of cysteines also does not appear to be correlated with the strength of binding nor the binding stoichiometry. Therefore, it may be beneficial to focus on chemical environment of each cysteine rather than number of cysteines when trying to improve MT binding affinity and stoichiometry to heavy metals.

Transcription Factors - Heavy Metals

Like with MTs, Heavy Metal regulated TFs also coordinate metal ions using cysteine. However, MTs are reasonably nonspecific in their Heavy Metal coordination, as that is what they have evolved to do, and hence can bind multiple. Dimeric Heavy Metal regulated TFs on the other hand, only ever coordinate two heavy metal ions, one for each symmetrical face. To determine which cysteine residues are normally involved in coordination, we used the existing structures of MerR and ArsR in complex with their heavy metal.

Figure 3: Structures of a) Bacillus megaterium MerR in complex with two Hg2+ ions (grey spheres, PDB: 4ua1). b) Cornyebacterium glutamicum ArsR in complex with two As3+ ions (purple spheres, PDB: 6j0e). Pictured are an image of the entire TF bound to two ions (left) and a closeup of the the ion coordinated by cysteines, shown in yellow (right).

The crystal structures reveal that MerR normally coordinates heavy metal ions using CYS79 of one chain, and CYS114 and CYS123 of the other chain for each heavy metal ion (Figure 3a), and ArsR coordinates with CYS34 and CYS37 of one chain, and CYS91 of the other chain (Figure 3b). We then found the free energies of heavy metal docking to these cysteines, for existing and AlphaFold predicted structures, and generated structures of docked TFs (Figure 4).

Transcription Factor Structure generated with Metal ion Free energy of binding (kcal/mol)
MerR PDB Hg2+ -6.40
ArsR PDB As3+ -4.38
ArsR PDB As5+ -3.82
PbrR PDB Pb2+ N/A
MerR AlphaFold (Link to sequence) Hg2+ -4.99
mut_MerR AlphaFold (Link to sequence) Cd2+ -3.93
ArsR AlphaFold (Link to sequence) As3+ -6.94
ArsR AlphaFold (Link to sequence) As5+ -4.24
PbrR AlphaFold (Link to sequence) Pb2+ N/A

Table 2: Free energies of binding for the TF structures from the PDB or generated with AlphaFold. Free energies of binding were calculated by totaling the free energies of successfully binding cysteines known to dock the heavy metal. N/A: No cysteines bound with negative free energy, indicating that binding would not be spontaneous.


Figure 4: Structures of a) PDB structure of MerR in complex with Hg2+, b) PDB structure of ArsR in complex with As3+, c) PDB structure of ArsR in complex with As5+, d) AlphaFold structure of MerR in complex with Hg2+, e) AlphaFold structure of mut_MerR in complex with Cd2+, f) AlphaFold structure of ArsR in complex with As3+, g) AlphaFold structure of ArsR in complex with As5+. Structures of PbrR with docked Pb2+ could not be generated.

The results indicate that using this method, heavy metals can be successfully docked into MerR, ArsR and mutated MerR. PbrR could not be successfully docked, as the model generated a positive value for the free energy of binding, indicating that the ligand cannot bind the structure, which is not true. This could arise from the assumption that PbrR and MerR use the same binding site because PbrR comes from the MerR family of transcriptional repressors, whereas in reality the binding site of PbrR is not known. The docking results also show that As3+ binds more strongly to ArsR than As5+, which is consistent with known results [18]. There exists variation between the AlphaFold predicted and PDB extracted structures, but this could also be due to the known TFs being from a different species than our AlphaFold predicted TFs.

Overall, these docking simulations have helped us gain deeper insight into the nature of both our biosensor and bioremediation device, showing which metallothioneins could be best to include in a hydrogel, and validating some assumptions about our biosensors. The failure of PbrR docking also could indicate it is not the best TF to be using for inducible constructs such as our biosensor. Importantly, this model has also further characterized mut_MerR, both in terms of structure and binding affinity to cadmium and has showed that this part could be useful for Cd2+ inducible constructs.


References