Modeling is an essential part of synthetic biology, helping us to design and understand the systems we use. It is a fundamental part of the iGEM competition too. The team did not have much experience in modeling, so it was even more exciting for us to learn and implement something new.
To understand our project better and present our project to sponsors, as well as at educational and Human Practice activities, we tried to create simple graphical models and visualizations by visual modeling.
From the very beginning of the project, we used visual models in our brainstorming sessions, but they were mainly hand-drawn diagrams or sketches. Later we also turned to digital techniques to produce detailed figures. The new figures were created using BioRender.
Of course, these images have subsequently been slightly modified, depending on what we want to say or show.
One of the first elements of our project and wet lab work was to design the vectors we would use in the project. For this, we used the pETARA vector. SnapGene, one of iGEM's sponsors, helped us with the design. These visual aids proved very useful as they made the engineering simpler and faster. They also helped us to reduce the possibility of making mistakes in cloning.
We also tried to create more professional visual models, thus we created a structural model of Cytolysin A. It was fascinating to see what the protein we wanted to use looks like in real life. It was also the right way to use this type of modeling to get a more in-depth look at the structure of our protein.
The Nanobody Display System, our tumor-detecting apparatus, is a key component of our project, whereas a Nanobody is expressed on the surface of the bacterium. With Nanobodies, we would like to achieve tumor-specific attachment of our engineered bacteria to cancer cells. To deepen our understanding of this process, we aimed to carry out in silico studies, providing valuable insights.
Our NanoBody coupled after the adhesion factor Intimin is a complex structure, only parts of which have been solved by NMR or crystallography. So our team decided to predict its structure using AlphaFold2 via ChimeraX4.1 for a complete picture and better understanding. We then docked the resulting structure with the structure of human EGFR from the experimental results to investigate the protein-level mechanism behind bacterial docking.
AlphaFold2 is available free of charge at ChimeraX1.4 via the Colab server by calling the Colabfold program. Once the prediction is complete, the best structure is immediately visible in Chimera.
Figure 7 shows the structure predicted by AlphaFold and its reliability. Regions that the program can determine with high confidence are shown in blue. The regions with low confidence in prediction are shown in increasing shades of red.
Figure 9 shows that a high agreement model was found for the intimin region in the bacterium, so its prediction is more confident. For the extracellular region, many models were found, but none of these models matched the template exactly. It is not surprising, as such a construct is not found in nature. Therefore, the program has put together a structure close to reality from the many different models.
Most of the structure is shown in blue, which was an encouraging sign for us, as this shows that the resulting structure is close to reality. The main uncertainty is seen in the intracellular region, which is probably due to its flexibility. However, the extracellular region relevant for docking is shown in cold colors, which means that the prediction is probably close to reality. And the regions in this region shown in warmer colors are the predicted flexible regions, whose flexibility is confirmed by the fact that they are harder to determine.
We also incorporated the docking element of the Nanobody system into our model. The interaction between the human EGFR antigen and the anti-EGFR NanoBody is a significant element of the diagnostic part of our project. We felt it necessary to verify this in silico. This was done by protein docking using the free-to-use (Cluspro).
For ease of calculation, we only examined the extracellular region accessible to EGFR. Several possible structures were obtained using (Cluspro). The model that the program considered the best is shown here:
In all the structures obtained, the EGFR antigen interacts with the Nanobody and not with other regions of the construct, which is a positive result for us. The docking also revealed the amino acids that form the link, which helped us understand the molecular background of the interaction better.
The BLADE Expression System is the other component of NanoBlade, which aims to carry out precise drug delivery in a light-inducible manner. Our bacteria can secrete Cytolysin A, a pore-forming cytotoxin, upon blue light induction, which makes our system highly controllable.
However, we need more insight into how our system works, as secreting such toxins raises many questions.
Our key interest was to estimate how much protein our system could produce, which would affect the number of bacteria that has to be introduced to achieve the therapeutic effect. Although light is perceived as a non-invasive tool, the light source produces significant heat, which could damage tissues. Therefore, it is also essential to determine how much light at what intensity is needed for efficient yet safe protein expression.
As a first step, we turned to the literature to see how systems like ours have been modeled so far. We wanted to build a model that includes the most essential elements of our system.
To do this, we have been greatly helped by the article by Benzinger and Khammash, which provided the foundations for our model. They modeled the light-dependent protein production of a similar one-component light-inducible system. [2, 5]
The first element of the model system is a transcription factor that dimerizes under the influence of light, binds to DNA, and initiates mRNA transcription. The mRNA binds to ribosomes and is transcribed to form Cytolysin A. For this basic model to be correct, we need to account for some additional elements. These include the fact that the mRNA and the protein that is produced have a certain degree of degradation. In addition, it is not negligible that the system has a basic degree of leakage. It must also be considered that the dimerized transcription factor may revert to a monomeric state before binding to DNA.
Of course, to model or simulate these, we needed to think about the reactions that make up our system. So, we wrote down all the reactions in our system one by one. For each step, we had to take into account constants (k) that determined the reaction rate. [8]
Then, using the prescribed reactions, the ordinary differential equations (ODEs) were determined. These equations describe the variation of concentrations with time:
To be able to simulate, we needed the constants in the equations, which were taken from Benzinger and Khammash. We could do this because they also modeled a single-component blue light-activated system in their paper. At some point, we have chosen the constants to better fit our theoretical ideas. [2, 9]
As we were looking for an easy-to-learn and simple program to create the simulation model, we chose MatLab, also one of the sponsors of iGEM. For the simulation, we first wrote the code and then built the simulation environment. To do this, we used the od15s function in MatLab and set the initial values.
Lines of code containing mathematical operations:
function dC = translation (t,C)
% Constants
TF_tot = 2000;
% Variable names
TF_on = C(1);
mRNA = C(2);
Protein = C(3);
% Rate constants
k1 = 0.0016399; % light-dependent activation rate [min-1 * (uW * cm-2)-1)]
k2 = 0.34393; % transcription factor reversion rate [min-1]
k3 = 0.24358; %basal transcription rate [mRNA * min-1]
k4 = 11.031; % maximal induced transcription rate [mRNA * min-1]
k5 = 1462.5; % TFon level required for achieving k4/2 [Molecules]
k6 = 0.042116; % mRNA degradation rate [min-1]
k7 = 1.4514; % translation rate [proteins * min-1 * mRNA-1]
k8 = 0.007; % protein degradation rate [min-1]
n = 2; % Hill-coefficient
I = 100; % Light intensity [%]
% Rate laws
r1 = I * k1 * (TF_tot - TF_on);
r2 = k2 * TF_on;
r3 = k3 + (k4 * TF_on^n) / (k5^n + TF_on^n);
r4 = k6 * mRNA;
r5 = k7 * mRNA;
r6 = k8 * Protein;
% Mass balance
dTF_on = r1 - r2;
dmRNA = r3 - r4;
dProtein = r5 - r6;
% Assign output variables
dC(1,:) = dTF_on;
dC(2,:) = dmRNA;
dC(3,:) = dProtein;
Lines of code containing the simulation environment:
C0 = [0, 0, 0];
tspan = [0:750];
[t, y] = ode15s(@name_of_the_mathematicaloperations_file, tspan, C0);
outp=[t y];
The simulation result was a series of data we could plot using Origin 2018.
Figure 12 shows that our model is a good representation of our theoretical assumptions, as the amount of protein produced varies with light intensity. Also, it can be seen that we have implemented the leakage in the model well since there is some protein production in the system, even at 0% light intensity.
Once we had the basic model, our goal was to adapt it as much as possible to our system and synchronize it with real measurement data. We, therefore, carried out experiments where we measured the amount of protein produced as a function of light intensity and time. For this, we used a plasmid construct with mCherry fluorescent protein after BLADE (J23101-BLADE-mCherry). (mCherry seemed an ideal model protein for us, as it has a similar size compared to ClyA). Thus, we could measure the effect of light intensity on protein production through the fluorescent protein. Although this was done using a model system for our model, it was not a problem because we were not interested in the protein in the first place, but in the whole production and the effect of light intensity on it. We used the results of the measurements to perform a global fit on the data in MatLab. It allowed us to match the initial constants given in the simulation to the system we produced. It made our model even more accurate and efficient.
With the new constant values, we now have a much more realistic model. With the new results, we ran the simulation again and checked how similar it was to the real data.
Although this graph is very similar to the previous one, you can see that the values along the y-axis are not the same.
We wanted to take our modeling further and get to know our system even better, so we have made some small improvements to our system. The first element was that we tried to simulate that Cytolisin A is not mainly used up inside the cell but released outside the cell. Therefore, the differential equation for the time variation of protein concentration was split into two parts and extended:
Of note is the k9 coefficient, which includes two coefficients. One part of it is the diffusion coefficient, which is related to the movement of the produced protein within the cell, while another part is the coefficient of cell membrane crossing.
The developed model is written and represented in the same way as before.
function dC = translation (t,C)
% Constants
TF_tot = 2000;
% Variable names
TF_on = C(1);
mRNA = C(2);
Proteinin = C(3);
Proteinout = C(4);
% Rate constants
k1 = 0.00469846355499977; % light-dependent activation rate [min-1 * (uW * cm-2)-1)]
k2 = 9.94175416491169e-05; % transcription factor reversion rate [min-1]
k3 = 0.707860050242641; %basal transcription rate [mRNA * min-1]
k4 = 17.5912739702708; % maximal induced transcription rate [mRNA * min-1]
k5 = 1113.48908841406; % TFon level required for achieving k4/2 [Molecules]
k6 = 0.00855774033800457; % mRNA degradation rate [min-1]
k7 = 2.01860595670445; % translation rate [proteins * min-1 * mRNA-1]
k8 = 0.00660717027639203; % protein degradation rate [min-1]
k9 = 0.24; % diffusion rate
k10 = 0.00660717027639203; % protein degradation rate [min-1]
n = 2; % Hill-coefficient
I = 1; % Light intensity [a.u.]
% Rate laws
r1 = I * k1 * (TF_tot - TF_on);
r2 = k2 * TF_on;
r3 = k3 + (k4 * TF_on^n) / (k5^n + TF_on^n);
r4 = k6 * mRNA;
r5 = k7 * mRNA;
r6 = k8 * Proteinin;
r7 = k9 * Proteinin;
r8= k10 * Proteinout;
% Mass balance
dTF_on = r1 - r2;
dmRNA = r3 - r4;
dProteinin = r5 - r6 -r7;
dProteinout = r7 - r8;
% Assign output variables
dC(1,:) = dTF_on;
dC(2,:) = dmRNA;
dC(3,:) = dProteinin;
dC(4,:) = dProteinout;
Simulation environment:
C0 = [0, 0, 0, 0,];
tspan = [0:750];
[t, y] = ode15s(name_of_the_mathematicaloperations_file, tspan, C0);
outp=[t y];
The result is still only a model of reality, but it is a very good and accurate approximation of them. Our model contains simple elements and steps that are easy for anyone to understand. Our final model has helped us to understand the system we have designed even better. It has also helped us to see how we can test and use the lighting system we have designed for the competition.
[1] Murase, K. Cytolysin A (ClyA): A Bacterial Virulence Factor with Potential Applications in Nanopore Technology, Vaccine Development, and Tumor Therapy. Toxins 2022, 14, 78. https://doi.org/10.3390/toxins14020078.
[2] Benzinger, D.; Khammash, M. Pulsatile Inputs Achieve Tunable Attenuation of Gene Expression Variability and Graded Multi-Gene Regulation. Nature Communications 2018, 9. https://doi.org/10.1038/s41467-018-05882-2.
[3] Smolen, P.; Baxter, D. A.; Byrne, J. H. Mathematical Modeling of Gene Networks. Neuron 2000, 26, 567–580. https://doi.org/10.1016/s0896-6273(00)81194-0.0.
[4] Yamada, M.; Nagasaki, S. C.; Ozawa, T.; Imayoshi, I. Light-Mediated Control of Gene Expression in Mammalian Cells. Neuroscience Research 2020, 152, 66–77. https://doi.org/10.1016/j.neures.2019.12.018.
[5] Ruess, J.; Parise, F.; Milias-Argeitis, A.; Khammash, M.; Lygeros, J. Iterative Experiment Design Guides the Characterization of a Light-Inducible Gene Expression Circuit. Proceedings of the National Academy of Sciences 2015, 112, 8148–8153. https://doi.org/10.1073/pnas.1423947112.
[6] Dresch, J. M.; Thompson, M. A.; Arnosti, D. N.; Chiu, C. Two-Layer Mathematical Modeling of Gene Expression: Incorporating DNA-Level Information and System Dynamics. SIAM Journal on Applied Mathematics 2013, 73, 804–826. https://doi.org/10.1137/120887588.
[7] Luciani, F.; Keşmir, C.; Mishto, M.; Or-Guil, M.; de Boer, R. J. A Mathematical Model of Protein Degradation by the Proteasome. Biophysical Journal 2005, 88, 2422–2432. https://doi.org/10.1529/biophysj.104.049221.
.[8] Ay, A.; Arnosti, D. N. Mathematical Modeling of Gene Expression: A Guide for the Perplexed Biologist. Critical Reviews in Biochemistry and Molecular Biology 2011, 46, 137–151. https://doi.org/10.3109/10409238.2011.556597.
[9] Jayaraman, P.; Yeoh, J. W.; Jayaraman, S.; Teh, A. Y.; Zhang, J.; Poh, C. L. Cell-Free Optogenetic Gene Expression System. ACS Synthetic Biology 2018, 7, 986–994. https://doi.org/10.1021/acssynbio.7b00422.
[10] Riglar, D. T.; Silver, P. A. Engineering Bacteria for Diagnostic and Therapeutic Applications. Nature Reviews Microbiology 2018, 16, 214–225. https://doi.org/10.1038/nrmicro.2017.172.
[11] https://2019.igem.org/Team:Oxford. Accessed October 8, 2022.
[12] https://2020.igem.org/Team:Leiden/Model. Accessed October 8, 2022.
[13] https://2020.igem.org/Team:Harvard/Model. Accessed October 8, 2022.
[14] Sathyanarayana, P.; Maurya, S.; Behera, A.; Ravichandran, M.; Visweswariah, S. S.; Ayappa, K. G.; Roy, R. Cholesterol Promotes Cytolysin A Activity by Stabilizing the Intermediates during Pore Formation. Proceedings of the National Academy of Sciences 2018, 115. https://doi.org/10.1073/pnas.1721228115.