Mathematical Modeling

Transcription and Translation

Overview

Code review

Results

Introduction

By using mathematical formulations, real-world problems can be modeled and studied. The Availability of mathematical models facilitated many scientific investigations and the discovery of many disease pathogenesis mechanisms. (1,2)
The standard procedure for strong inference in mathematical modeling entails four primary phases:

I. Creating a variety of models for the phenomenon in the issue.

II. Comparing these models to any available experimental data and eliminating the uncorrelated ones.

III. Learn why unsuitable models couldn't account for the facts.

IV. Proposing experiments that could distinguish amongst the remaining hypotheses. (3)

Our mathematical modeling was mainly based on 3 main functions: the assumption of degradation, transcription, and translation steps of our main proteins, Tau and Amyloid beta (Aβ). Through these 3 functions, we can easily hypothesize and predict the change and stability of protein expression rate.

Outline

We aim to detect the transcription and translation rates of our parts as the prediction of each rate of them will help us to predict the possible amounts of both proteins and mRNA, besides helping in the determination of the stability of proteins and their efficiency after expression. The creation of such a model helped us greatly in achieving the main goal of estimating the amount of transcribed mRNA and expressed proteins. We consider this model a major asset for our team and the future iGEM teams.

Table.1: Parameters and processes

I. Transcription :

Outline:
This model is built to estimate the mRNA concentration due to the transcription of a recombinant gene downstream of the T7 promoter and lac operator under the induction of IPTG at different concentrations.
Assumptions and parameters:

The main points are:
I. IPTG enters the cell by passive diffusion only.
II. lacI only exists from the translation of the lacI gene in our plasmid.
III. The used copy number was 600 as pGS21a vector Ori is of high copy number.

Model explanation: The amount of lacI protein is estimated from the amount of lacI mRNA, which is dependent on its transcription rate and mRNA degradation rate in E. coli.
The processes of mRNA transcription and degradation are expressed in equations 1 and 2, respectively:

The mRNA is then translated to give lacI as in equation 3:

Then lacI dimerizes in a reversible process:

Upon dimerization, lacI can bind to lacO and stop the transcription process of our recombinant DNA. the amount of lacO is expressed as the copy number of our plasmid:

The amount of IPTG that enters the cell is diffused into and out of the cell at equal constant rates until the amount of extracellular IPTG equalizes the amount of intracellular IPTG as follows:

The intracellular IPTG can stop transcription repression due to lacI by two pathways. In the first pathway, IPTG binds to lacI dimer and prevents it from binding to lacO as follows:

In the second pathway, IPTG substitutes lacO in lacO/I complex and frees the lacO to initiate the transcription

The rates of degradation of lacI, lacI dimer, and lacI dimer bonded to IPTG are expressed and integrated into the model as follows:

Finally, only genes that have free lacO are transcribed to give mRNA. Their transcription depends on the promoter transcription rate, which can be calculated by elongation velocity divided by sequence length, copy number of genes with free lacO, and the leakage expression of the lacI/O complex.

The model was built on the following ODEs based on the previous equations.

II. Translation:

Outline
The model aims to estimate the translational efficiency of a given mRNA sequence by obtaining the probability of a given mRNA being bound to a ribosome directly proportional to the protein expression level.

Assumptions and parameters:
to calculate the TIR, as mentioned by (Dokyun et al., 2010), the model includes three sequential events in initiation:
I. The thermodynamic folding of all transcribed mRNAs is predicted from UNAFold, at which secondary structures and their energies are calculated.
II. Exposure of the mRNA's ribosome-docking site (RDS), a sequence of 30-nucleotide found upstream of the start codon where ribosome docking occurs.
III. The binding of a ribosome to the exposed RDS through the ribosome-recognizing sequence (RRS), a 10-nucleotide sequence that includes the Shine Dalgarno SD sequence and is complementary to the 3' end of the 16S rRNA, termed anti-RRS.
Model explanation:
The probability of a given secondary structure to exist is calculated using a partition function for the predicted conformations and their Gibbs free energies.

For the ribosome to bind to RDS of mRNA, folded mRNA must be unfolded. However, ribosomes cannot unwind base-paired mRNA structures during translation initiation, but the overall probability that all secondary structures in an RDS would be unfolded at any given moment (p_ex) can be calculated by the summation of the probability of a specific conformation to unfold (p_i) multiplying its probability to exist (p_(si)).

The probability of a specific secondary structure to unfold (p_i) is calculated from the nucleotide-unpairing probability (θ) in an RDS of conformation (S_i)

(θ_i,j) denotes the nucleotide-unpairing probability of the j-th nucleotide in an RDS of conformation (S_i), (ΔG_(ij))denotes the Gibbs free energy of a stack structure in a ribosome-docking site containing thej-th nucleotide, and L denotes the number of nucleotides in a given stack structure.

The probability of binding of ribosome to the mRNA (P_c) can be calculated from the following equation

Meanwhile α is calculated from the following equation:

The amount of target mRNA calculated from transcription model is multiplied by the probability of binding of ribosome to target mRNA to get the amount of active mRNA for translation and the protein then the protein is degraded due to its half-life and cell protein turn over as the following :

Finally the change in the concentration of protein with respect to time is written in the following ODE:

The code is written to import the required packages; math to get the exp() function for calculations, NumPy to create the time array, SciPy to solve the ordinary differential equations, os to access files from the operating system, and matplotlib to plot the results of the simulation.

Then assign the constants and ask for the inputs (DNA sequence length, protein sequence length, plasmid copy number -you can input different copy numbers-, IPTG concentration -you can input different IPTG concentrations-, mfold thermodynamics parameters text file location, RBS calculator hybridization energy, working temperature, time of simulation and time steps) for the differential equations.

The mRNA sequence is given to RBS calculator, and the chassis is selected; then we take the ΔG_mRNA-rRNA value to the code, then the RBS sequence is given to mfold for predicting RBS folding and its thermodynamic parameters, then the parameters are copied into a text file and gathered in one folder, the folder path is inputted to the code.

A for loop is designed to extract the thermodynamic parameters from the text files copied from mfold. Then six functions are defined to calculate thermodynamic energies and probabilities, followed by the ODE-solving function. The time array is assigned, followed by another loop to assign initial concentrations and plot outputs.

Code can be found in our team's GitHub and the programming club.

Before Running the code, we have put some expectations for the results and how the graph will be plotted.

According to the literature and our code assumption, the code will give a final graph that contains mRna and protein concentration in the same graph; the Protein line will be above the mRNA. The graph should start with an increase in the mRNA concentration normally until the formation of LacI. The mRNA will decrease according to the formation of LacI repressor; this decrease is encountered by the concentration of IPTG inside the cell; when the activity of lacI in repressing the transcription and the activity of IPTG come to equilibrium, the graph reaches the plateau, with the notation of if the length small the rate of transcription and translation increase.

Trim system (Snitch)

His Coh

Fig.1: This figure shows the behaviour and the concentration of mRNA and protein for His Coh (1mM ) according to our Mathematical Model.

His Doc

Fig.2: this figure shows the behaviour and the concentration of mRna and protein for His Doc (2Mm) according to our Mathematical Model.

GST doc

Fig.3: This figure shows the behaviour and the concentration of mRna and protein for GSt Doc with 2 different IPTG concentrations 1mM on the right, 2mM on the left (reflect too wet lab) according to our Mathematical Model.

GST Coh

Fig.4: this figure shows the behaviour and the concentration of mRna and protein for GSt Coh (1mM) according to our Mathematical Model.

His tau

Fig.5: this figure shows the behaviour and the concentration of mRna and protein for His tau (1Mm) according to our Mathematical Model.

His Trim-Linker-Docs

Fig.6: This figure shows the behaviour and the concentration of mRna and protein for His Tau-linker-Docs with 2 different IPTG concentration 1mM on the right, 2mM on the left (reflect too wet lab) according to our Mathematical Model.

His Trim

Fig.7: this figure shows the behaviour and the concentration of mRna and protein for His trim (1Mm) according to our Mathematical Model.

(Gst) Coh-linker-TD28rev

Fig.8: this figure shows the behaviour and the concentration of mRna and protein for Coh-linker-TD28rev (1Mm) according to our Mathematical Model.

(GST) Coh-linker-www

Fig.9: this figure shows the behaviour and the concentration of mRna and protein for Coh-linker-www (1Mm) according to our Mathematical Model.

UBE2N

Fig.10: this figure shows the behaviour and the concentration of mRna and protein for UBE2N (1Mm) according to our Mathematical Model.

UBE2V2

Fig.11: this figure shows the behaviour and the concentration of mRna and protein for UB2V2 (1Mm) according to our Mathematical Model.

UBE2W

Fig.12: this figure shows the behaviour and the concentration of mRna and protein for UBE2W (1Mm) according to our Mathematical Model.

UBQC

Fig.13: this figure shows the behaviour and the concentration of mRna and protein for UBQC (1Mm) according to our Mathematical Model.

Discussion

The Results shown above were the same as our Wet-Lab work, but there are some differences in some parts, as the increase in the IPTG concentration will lead to an increase in transcription and translation rate like in tau-linker-Docs and Gst-Docs. We used 2 tags for the Docs and Coh; this part of making sure which one of the tags would be preferable to use, and finally, the length of the part will affect the results as the low length will lead to the High transcription and Translation rate.

Conclusion

Our mathematical modeling aim was to predict effective Wet-Lab concentrations by Dry-Lab coding prediction methods for further accuracy and validation. TIR and Transcription rate mathematical codes were the heroes of our journey for validation and helping in the Wet-Lab experimental process.

HTRA1 Enzymatic activity

Overview

Code review

Results

Introduction

Why did we use the Michaelis-Menten kinetics theory?
Since our project has two main systems, The Snitch system, and the Plug-Sink systems, those two systems rely heavily on the activity of enzymes, mainly the ligases and proteases present in both systems. So, from this point, we searched for a kinetic model which can cover and calculate the activity of our enzyme on the substrate comprising the rate of action of the enzyme on the substrate, rates of association and dissociation of the enzyme with the substrate, and the optimum concentration of the substrate at which the reaction depends on.

Michaelis–Menten kinetics is an equation that tells or estimates the velocity of a certain reaction catalyzed by an enzyme. This theory just measures the rate of formation of products over time in a reversible reaction and formation between the enzyme and its substrate. The theory assumes that the rate of formation of products is directly proportional to the concentration of the Enzyme-Substrate complex [11].

Since the main concept of our equations is based on the equilibrium approximation derivation and mass conservation relationships, so our model divides into two main equations (A) and (B) as follows.

Equation (A) describes the equilibrium step depending on the equilibrium rate constants (K_eq). Equation (B) describes the rate of degradation of the enzyme to the substrate depending on the catalytic rate of degradation of the enzyme (cat).

For our system, there is the protease (HtrA1) which degrades the substrate (Tau/Aβ). Using the mass conservation relationship on the enzyme and applying the equilibrium approximation to our models:

After substituting both reactions 3 and 4, this will lead to:

After the rearrangement of the equation number 5, this will lead to a new rearranged format:

The second part of the equation which describes the rate of degradation of our protease (HtrA1) with the substrate (Tau/Aβ) depends on the K_cat (Catalytic rate of degradation) and the concentration of the Enzyme-Substrate complex:

After rearranging and substituting equation 6into equation 7

Further rearrangement:

So that the final equation we reach is:

The resulting curve is a non-linear one describing the actual rate of the enzyme.

So in order to make it linear to be able to calculate the slope, we used the lineweaver-Burk plot.

The code is built using MATLAB due to its feasibility and friendly interface, in addition to the presence of built-in existing apps and packages on MATLAB. The code starts by requesting three inputs: the dissociation rate constant (K_d), Gibbs free energy (ΔG), and catalytic rate of degradation (K_cat) in which all of the inputs will be stored in an array, respectively.

These input values are required for certain reasons:

I. The (K_d) is required as it will be used in the model starting from equation (5).

II. ΔG as it is required to calculate the Keqif it is not present according to the following equation: ΔG = -RT lnK_eq so that we can calculate the Ka according to the following equation K_eq =^k_d⁄_{k_a}.

Note: If the value of ΔG does not exist, it can be calculated from the PRODIGY web server

III. The third required input is the K_cat of the enzyme against the substrate.

The second part is taking another group of inputs and storing them into arrays as well, which are the initial concentration of the enzyme and the initial concentration of the substrates.

The third and fourth parts are about calculating the [ES] complex using equation (6), and the resultant concentration is multiplied by Kcatof the enzyme yielding v of the reaction.

Finally, the ¹⁄_v of the reaction is plotted (using a function named plot) against thethe ¹⁄_[S] to yield the linear relationship (Lineweaver-Burk Plot) [12].

Comment: As we can see from the above figures, the rate of HtrA1 activity kept increasing through different concentrations of tau till it began to reach the plateau stage nearly starting at a concentration of 0.8 µg/ml of substrate (Fig.1). This non-linear curve is then changed to a linear relationship known as the Line-Weaver Burk plot which is easier to interpret and extract the slope from, thus getting the K_Mof the enzyme.

Comment: As we can see from the above figures, the rate of HtrA1 activity kept increasing through different concentrations of amyloid beta till it began to reach the plateau stage nearly starting at a concentration of 4 µg/ml of substrate (Fig.1). This non-linear curve is then changed to a linear relationship known as the Line-Weaver Burk plot which is easier to interpret and extract the slope from, thus getting the K_Mof the enzyme.

Conclusion

As a conclusion, the model proved a point that is mentioned in papers, which is that the rate of degradation of tau by HtrA1 is higher than the rate of degradation of Aβ by HtrA1. This is proved via the docking results and the graphs of this model.

Trim Enzymatic activity

Overview

Code review

Results

Introduction

The ubiquitin-proteasome system (UPS) is eukaryotic cells' major intracellular protein quality control system [13]. We depend on the UPS pathway for the degradation of our target protein, microtubule-associated protein tau (MAPT), abbreviated as tau, which is the substrate in our model. This model simulates the degradation process of UPS, including all previously demonstrated steps using the ODE model.

The model was built based on previous work (Xu & Qu, 2012).

Outline

The model is built to determine the time and the initial concentration of enzymes needed for complete degradation of the substrate with previously known initial concentration or determining the amount of degraded substrate given a specific concentration of ubiquitin, E1, E2, E3, and 26S proteasome.

Assumptions and parameters:

The model assumes that there is no deubiquitinase (dub) enzyme that removes ubiquitin from the substrate, since this correlates well with the in vitro ubiquitination assay in our project. Therefore, the ubiquitins bonded to the substrate in any number (ub_nS) will not be dissociated. This assumption is based on the wet lab assay that does not utilize dub in the process. The model also assumes that our E3 ligase, which is truncated, tTrim21, will behave as the whole E3 from the paper since we cannot identify the truncation effect until the wet lab experiment. The model also neglects the protein half-life, assuming that proteins are persistent in the reaction medium. Finally, the model assumes that ATP concentration is in excess in the medium and will not interfere with the rate of the reactions or the concentrations of the intermediates.

All parameters obtained from (Xu & Qu, 2012).

Model explanation: The first step in the model is the activation of the ubiquitin (ub) by ubiquitin activating enzyme (E1) which is an ATP dependent step for the formation thioester bond between the cysteine in the active site of E1 and the carboxyl terminus of the ubiquitin.

The next step is the transfer of ub from E1 to ubiquitin conjugating enzyme (E2) which is also an ATP dependent step.

In parallel the ubiquitin ligase enzyme (E3) binds to the substrate molecule, which is going to be tagged and degraded, in a reversible binding.

Afterwards the E2-ub complex binds to the E3 enzyme carrying substrate, which is either free, monoubiquitinated or polyubiquitinated.

Then the ubiquitinated substrate may be separated from the E3 at any instant, the rate of separation k_5.n increases when ubiquitin chain (n) increases.

After separation from E3 the ubiquitinated substrate that carries at least 4 ubiquitins, which is the minimum number needed for degradation by 26S proteasome (26S), binds to the 26S proteasome and become one step away from degradation.

After binding, 26S proteasome degrades the substrate releasing the ubiquitins (n.ub) and peptides byproducts (P) of the substrate.

From the above equations we can conclude the rate laws of each reaction as the following:

From the rate laws we can write the ODE equations for each compound as follows:

This is the general model for ubiquitination, but our case is different in one step, as the binding of Substrate to our E3 ligase (tTrim21) is mediated by protac molecule. Our E3-protac system is based on the interaction between Dockerin S and Cohesin 2 binding partners. We wanted to use and test new approach for determination of rate constants that is based on the structural model of the proteins and prodigy package which analyses the protein complex contacts and calculates energy of binding and dissociation constant of the complex.

From these two parameters we can use the law of Gibbs Energy in Equilibria to calculate rate of association as the following:

I. If we assume that our reaction is at equilibrium. Therefore:

II. And since the reaction is in equilibrium. Therefore:

From these two equations and since we have ∆G and k_d calculated from prodigy, we can calculate ka and use it in our model as the following:

We used the experimental model of DocS and Coh2 (pdb id: 2CCL) and ran it on prodigy to calculate the ∆G and k_dwhich were -9.9 kcal.mol^-1 and 5.2 x 10^-8.

From these values and the equation above, the value of k_ais calculated to be 0.92. We used this value for the rate law of the following reaction.

Where A is the tTrim21-linker-DocS, B is GST-Coh2-linker-TBP, and C is their complex based on the interaction between DocS and Coh2. Then C is treated as E3 in the generalized model and the model goes on.

The rate laws of this reaction are:

These two laws are added to the generalized model to get the specialized model.

At the beginning, the required packages: NumPy, SciPy and matplotlib are imported, then the function that solve the ODEs is defined.

Inside the function, the rate constants are defined, then each compound is indexed to the concentrations list to which the initial concentration will be inputted, and results will be stored. After that, the rate laws are defined by the equations mentioned in the model explanation, then these equations are used to solve the ODEs.

The list of initial conditions and the time array is defined, then the outputs are stored in new list and each compound concentration array is taken from this list by the same index given inside the function.

This code follows a specific ubiquitination for our target protein, If you want to get to the generalized form of the code you can find it in PDF.1, want to know about the theory upon which we built our code you can get a quick brief from PDF.2

PDF.1

PDF.2

The model works efficiently in simulating the whole process of UPS, the change in the concentration of every compound with time is almost as expected. However, ub₈S26S, ub₇S26S, ub₈S and ub₇S complexes graphs show anomalous behaviour as the concentration becomes of negative value, this may be due to the high rate of degradation in reactions 7.7 and 7.8 and high rate of binding in reactions 6.7 and 6.8 where at the point of highest concentration the calculated decrease in concentration becomes higher than the concentration itself, we hope to optimize this problem in the future versions of the model concentration becomes higher than the concentration itself, we hope to optimize this problem in the future versions of the model.

We wanted to see how different concentrations of ubiquitin will affect the change in concentration of the substrate with time. It seems that as the concentration of initial ubiquitin increases the rate of degradation of the substrate increases.

Even though there is no initial concentration of ubiquitin in the blue curve the substrate concentration shows the same initial decline as other curves, so we conclude that this decline is not due to the ubiquitination, but due to the formation of E₃S complex, the graph showing the change in concentration of substrate and E₃S complex at zero initial concentration of ubiquitin demonstrate this.

Another observation is that ub₁S, ub₂S and ub₃S are formed and remained since in the original model of (Xu & Qu, 2012). These complexes are consumed due to the effect of deubiquitinase enzyme while it is absent in our experiment which means that they will not be consumed or go through further processing in this model.

In the specialized model the results were almost the same, since the rate of association of TLD and GCLTBP is very high. So, it will take almost 1 sec to reach the maximum concentration of TLD-GCLTBP complex.

Due to the high calculated association rate, the TLD-GCTBP complex is formed while TLD and GCTBP is consumed quickly in almost 1 sec when the initial concentration of both is set to 100 nM