Model | REC-CHENNAI

The crucial steps in our project include biofilm attachment, production of recombinant proteins and designing the reactor. In order to optimise the vital elements in our project, we created and instigated modelling. We executed mathematical and computational models to answer the following questions:

Is it possible to predict biofilm detachment from biocarrier?
What helps in the efficient production of curli fibres?
How to identify the important parameters for the over-expression of fusion protein?
Why should we learn about the model confidence of fusion protein and how can it be done?
Will the active site of acid phosphatase be different in native aphA and aphA present in fused form?

Module 1: Biofilm Detachment Model

Theory

The process in which the microorganisms attach to and grow on a surface is called biofilm formation. Biocarriers provide surface for the biofilm attachment. It is essential to understand the steps involved in the formation to derive the model.

The stages of biofilm formation are as follows :

Attachment
Proliferation
Maturation
Detachment
Dispersion of planktonic cells

Detachment is one of the phases in the biofilm cycle. Since the biofilm is attached to the biocarrier, the detachment phase shouldn't lead to the run off of engineered bacteria. Hence, it is critical to identify when the biofilm present in the biocarriers need replacement. Conditions that govern detachment of biofilm are stress parameter of bacteria, biofilm strength, liquid shear and other properties of the bacteria.

Assumptions

The biofilm is considered to be homogenous.
The biofilm is assumed to have physical properties that do not vary in magnitude with respect to direction of measurement.
The entire three dimensional system is considered to be in a two dimensional scenario by assuming all the events occur only in the xy plane.
Reattachment is not possible after detachment. However, after detachment, dispersion of planktonic cells can occur and reversion to planktonic growth is possible.

Derivation

C - Yield stress parameter of bacteria

M_C - Fluidity of the bacteria

N_C - Flow index of bacteria

^-1 - Power-law index of the model

L_A - Strain rate for Acc based on bacterial physical parameters

L₀ - Initial stress tensor based on density

a_x - attachment in X axis

a_y - attachment in Y axis

g_x - growth in X axis

g_y - growth in Y axis

d_x - detachment in X axis

d_y - detachment in Y axis

Results

Gro → Growth

Det → Detachment

Att → Attachment

Module 2: Curli Fibre Production

Theory

The csgBAC operon encodes csgA, csgB and csgC. csgA is the major structural subunit of curli protein and it is mainly responsible for biofilm formation. csgB is the minor structural subunit and it binds to the extracellular matrix and facilitates the polymerization for the production of curli proteins. csgC is required for the correct assembly of mature curli fimbriae. csgD is the positive transcriptional regulator of csgBAC operon. The transition from the planktonic to the multicellular state of the bacteria is controlled by csgD through regulation of curli genes. To activate csgD gene and produce curli proteins necessary for biofilm formation, upregulation of OmpR is required.

Flowchart

Assumptions

The concentration of gene is considered to be constant.
Transcription and translation processes are not hindered due to any disturbance.
RNA polymerase is available in adequate quantity.
Ribosomes are present in sufficient amounts to aid in translation.

Derivation

mRNA formed per unit time is given by:

d[mRNA] / dt = k_₁[gene] - d_₁[mRNA] (1)

Protein formed per unit time is given by:

d[protein] / dt = k_₂[mRNA] - d_₂ [protein] (2)

Where,

k_₁ is transcription rate

d_₁ is mRNA degradation rate

k_₂ is translation rate

d_₂ is protein degradation rate

OmpR protein production:

d[mRNA OmpR] / dt = k_₁[gene OmpR] - d_₁ [mRNA OmpR] (3)

d[protein OmpR] / dt = k_₂ [mRNA OmpR] - d_₂[protein OmpR] (4)

OmpR protein has been produced, now OmpR upregulation activates csgD production

d[mRNA csgD] / dt= k_₃ [gene csgD] - d_₃ [mRNA csgD] (5)

d[protein csgD] / dt = k_₄ [mRNA csgD] - d_₄ [protein csgD] (6)

We know that csgD is a positive transcriptional regulator of csg BAC operon. csgD controls the transcription of csgA, csgB and csgC genes.

d[mRNA csgA] / dt = k_₅[gene csgA] - d_₅ [mRNA csgA] (7)

d[mRNA csgB] / dt= k_₆[gene csgB] - d_₆ [mRNA csgB] (8)

d[mRNA csgC] / dt = k_₇[gene csgC] - d_₇ [mRNA csgC] (9)

Parameters involved:

Parameter	Description	Values	SI unit	Reference
k₁	Rate of transcription of OmpR	TBF	Sec ^-1	-
k₂	Rate of translation of OmpR	TBF	Sec ^-1	-
k₃	Rate of transcription of csgD	0.0214	Sec ^-1	Proshkin, Sergey, et al. 2010
k₄	Rate of translation of csgD	TBF	Sec ^-1	-
k₅	Rate of transcription of csgA	0.0921	Sec ^-1	Proshkin, Sergey, et al. 2010
k₆	Rate of transcription of csgB	0.0214	Sec ^-1	Proshkin, Sergey, et al. 2010
k₇	Rate of transcription of csgC	0.0214	Sec ^-1	Proshkin, Sergey, et al. 2010

TBF^* - To be found

Results

It is evident from the above mentioned flowchart and equations that the mRNA of csgA,csgB and csgC is produced in greater amounts when the transcriptional regulator, csgD protein is formed effectively. Increase or decrease in the csgD protein levels will control the transcription of csgA,csgB and csgC curli genes.

Module 3: Fusion Protien Expression

Theory

The gene is transcribed into mRNA and the mRNA is translated into protein. Promoter strength, plasmid copy number, transcription factors, RNA polymerase are important parameters responsible for effective transcription process. For the effective translation process, ribosome, strength of ribosome binding site(RBS), tRNA play a pivotal role.

Assumptions

For successful transcription, many components are involved, but in this derivation we are only considering transcription factors while forming the equation.
Transcription and translation processes are not repressed by any means.

Derivation

A → Promoter(BBa_J23100/BBa_K896008)

B → Transcription factor

C → mRNA

D → Fusion protein(OmpA - aphA)

K_on represents binding between A and B

K_off represents unbinding between A and B

d[A.B] / dt = k_on [A] [B] – k_off [AB] (1)

d[A] / dt = -k_on [A] [B] + k_off [AB] (2)

d[B] /dt = – k_on [A] [B] + k_off [AB] (3)

[AB] + [A] = C_ₙ (4)

Where C_ₙ is plasmid copy number.

[AB] = C_ₙ k_on [B] / k_on [B] + k_off

[AB] = C_ₙ[B] / k_d + [B] (4)

d[C]/dt = (k_₁C_ₙ[B] / k_d +[B]) - d_₁[C] (5)

d[D] / dt = k_₂ [C] – d_₂ [D] (6)

C = (k_₁ C_ₙ [B]/ k_d + [B])/d_₁ (7)

D = (k_₂ k_₁ C_ₙ/ d_₁d_₂) (β_o + (1- β_o)( [B]^ⁿ/ k_d + [B]^ⁿ) (8)

In the Hill function,

β_o is the basal expression

n is hill’s coefficient

k_dis apparent dissociation constant

Results

Transcription is mainly dependent on promoter and transcription factor. For the gene expression, transcription is one of the rate determining steps. The gene expression is controlled by various factors.Depending on the plasmid copy number and the promoter strength, the protein yield will differ. So for different promoters, the expression of the fusion protein will be different based on the individual promoter strength.

Module 4: Structure Modelling

1. I-TASSER

Fusion proteins can be produced when two genes coding for different proteins are linked by a linker by a linker sequence and expressed as a single entity.In our project, OmpA gene attached to a linker is joined with aphA gene. The role of OmpA linker is to bring aphA protein to the cell surface. For the efficient fusion protein yield, the binding affinity should be strong. To check the model confidence of the fusion protein, we have used the I-TASSER server. The server provided C - score for the fusion protein. We analysed the C - score to check the quality of the predicted fusion protein model. C - score obtained using the I-TASSER helped us understand the model that we initially built. We will work on improving the C - score of the fusion protein till it achieves a high confidence score. Z-score is the normalized Z-score of the threading alignments. Alignment with a Normalized Z-score greater than 1 means a good alignment. The fusion protein sequence received 2.56 Normalized Z-score and pdb hit in Rank 1 is 1n8nA.

(yellow ribbon - OmpA-linker; blue ribbon - acid phosphatase)

2. AlphaFold

To check the model confidence we used another software called AlphaFold. The confidence score of the functional protein domains in our structure such as aphA and ompA range from confident to very confident (blue to dark blue) with slight distortion and lower confidence as a result of the linker molecule between the two major protein domains. (In the below model, blue ribbon represents OmpA-linker and green ribbon represents aphA)

Our pLDDT score for amino acids also has high confidence for the two protein domains with some distortions predicted due to the presence of linker between the two protein domains.

Overall, considering our confidence scores, pLDDT, and Predicted aligned Error, we concluded that our fusion protein system is functional in the expected manner and we proceeded with construction.

Module 5: Aligning Native aphA and OmpA - aphA

The native acid phosphatase will have an active site for substrate binding. The conformation of the active site in aphA shouldn’t change when aphA is present in fused form. To check if there is any change in the active sites, we have made use of PyMOL. We have utilised the ‘align’ function in PyMOL which performed sequence alignment and structural superimposition. We aligned the OmpA-aphA model predicted by I-TASSER and the OmpA-aphA model predicted by AlphaFold Colab with the native aphA. We observed that the aphA in OmpA-aphA fused form and the native aphA superimposed perfectly and thus confirming that the active site of aphA was not disturbed. Hence, it is proved that there is no difference in the active sites of native aphA and OmpA-aphA.

iTASSER fusion protein prediction and native acid phosphatase aligned using PyMOL. (Pink ribbon - native aphA; Yellow ribbon - ompA of fusion protein; Blue ribbon- aphA of fusion protein)

Alphafold fusion protein prediction and native acid phosphatase aligned using PyMOL.(Blue colour - Alphafold fusion protein; Pink ribbon - native acid phosphatase.)

References

C. Picioreanu, M.C.M van Loosdrecht, J.J. Heijnen, ”Two dimensional model of biofilm detachment caused by internal stress from liquid flow”, Biotechnology and Bioengineering, vol. 72, no. 2, pp. 205-218, 2001. https://doi.org/10.1002/1097-0290(20000120)72:2%3C205::AID-BIT9%3E3.0.CO;2-L
Picioreanu, C., Van Loosdrecht, M. C., & Heijnen, J. J. (1998). Mathematical modeling of biofilm structure with a hybrid differential‐discrete cellular automaton approach. Biotechnology and bioengineering, 58(1), 101-116. https://doi.org/10.1002/(SICI)1097-0290(19980405)58:1%3C101::AID-BIT11%3E3.0.CO;2-M
Picioreanu, C., van Loosdrecht, M., & Heijnen, J. (1999). Multidimensional modeling of biofilm structure. Delft University of Technology, Faculty of Applied Sciences.
Rittman, B. E. (1982). The effect of shear stress on biofilm loss rate. Biotechnology and bioengineering, 24(2), 501-506. https://doi.org/10.1002/bit.260240219
Proshkin, S., Rahmouni, A. R., Mironov, A., & Nudler, E. (2010). Cooperation between translating ribosomes and RNA polymerase in transcription elongation. Science, 328(5977), 504-508. https://doi.org/10.1126/science.1184939
Ledezma-Tejeida, D., Altamirano-Pacheco, L., Fajardo, V., & Collado-Vides, J. (2019). Limits to a classic paradigm: most transcription factors in E. coli regulate genes involved in multiple biological processes. Nucleic acids research, 47(13), 6656-6667. https://doi.org/10.1101/479857
Kim, H., & Gelenbe, E. (2011). Stochastic gene expression modeling with hill function for switch-like gene responses. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 973-979. (https://doi.org/10.1109/TCBB.2011.153)
Likhoshvai, V., & Ratushny, A. (2007). Generalized Hill function method for modeling molecular processes. Journal of bioinformatics and computational biology, 5(02b), 521-531. (https://doi.org/10.1142/S0219720007002837)
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., & Zhang, Y. (2015). The I-TASSER Suite: protein structure and function prediction. Nature methods, 12(1), 7-8. (https://doi.org/10.1038/nmeth.3213)
Roy, A., Kucukural, A., & Zhang, Y. (2010). I-TASSER: a unified platform for automated protein structure and function prediction. Nature protocols, 5(4), 725-738. (https://doi.org/10.1038/nprot.2010.5)
Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. BMC bioinformatics, 9(1), 1-8. (https://doi.org/10.1186/1471-2105-9-40)
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. (https://doi.org/10.1038/s41586-021-03819-2)
Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: making protein folding accessible to all. Nature Methods, 1-4. (https://doi.org/10.1038/s41592-022-01488-1)
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. (https://doi.org/10.1038/s41586-021-03819-2)
Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: making protein folding accessible to all. Nature Methods, 1-4.(https://doi.org/10.1038/s41592-022-01488-1)
Schrödinger, L., & DeLano, W. (2020). PyMOL. Retrieved from PyMOL