L O A D I N G . . .

Model

Gradient Marker

Background

Our hardware needs to establish chemokine concentration gradients that allow sperm to move in a directional manner along this gradient. In the preliminary idea, in order to make the liquid fully mixed and the concentration in the pipeline uniform, we set up a multi-curved mixing channel to achieve the purpose. However, in the case of limited chip size, we need to further optimize the volume of this part. Therefore, we modeled this part and discussed the formation mechanism of concentration gradient by means of fluid motion coupled with solute diffusion.

Method

Hypothesis

  1. The channel in the chip has little variation in the thickness direction, so the dimension of the thickness direction is ignored and the plane model of the chip is established.
  2. Consider the flow and concentration under steady-state conditions, that is, the flow and concentration in the channel do not change with time.
  3. The fluid property is incompressible, and the volume force is ignored.

Fomula

Navier-stokes equations for steady-state incompressible conditions:

fomula1

Where ρ is the fluid density, u is the velocity vector, p is the pressure, I is the identity matrix, K represents the viscous term in the equation, and μ is the dynamic viscosity coefficient. In the calculation, either u or p (note: in this model, the flow velocity is positively related to pressure, so pressure is usually used in the calculation instead of velocity) will be given as boundary conditions, and ρ and μ will depend on the type of fluid. This equation is the basis for numerical calculation of the flow model.

The above model explains the influence of velocity (convection) on the formation of concentration gradient. In the vertical direction of velocity, there is also a diffusion phenomenon caused by the solute itself. The influence of this part on concentration needs to be explained by Fick's second law of diffusion:

fomula2

Where D is the diffusion coefficient, c is the concentration, t is the time, and x is the distance. In addition, we can also calculate the diffusion coefficient by linear regression of another form of this equation(C is a constant):

fomula3

Based on the above equations, we will use COMSOL Multiphysics 5.4 to simulate the model. The solvent is water, so ρ and μ value of the water parameters( ρ = 1 g/cm3, μ = 1 * 10-3 Pa * s). The pipe diameter is d = 0.1 mm

Results

As can be seen from the following figures, under steady-state flow, the faster the flow velocity in the channel (or the larger the pressure difference between the inlet and outlet), the weaker the diffusion ability of the solute, and the easier the concentration gradient is to form. Therefore, our model is not limited to a certain solute, and any solute can find a suitable inlet and outlet pressure difference through our model, thus forming a concentration gradient. Our model therefore has good generality. At the same time, the simulation results are very similar to the experimental results, which further verifies the reliability and stability of our model.

The gradient marker.
Fig 1. The results of gradient marker.

Conclusion

In practical application of the gradient marker model to construct the concentration gradient, theoretically we can adjust the pressure difference at the inlet and outlet to form a stable concentration gradient for solutes with different diffusion coefficients. In addition, convection is the main factor affecting the formation of concentration gradient in our model, and the solution is completely mixed by convection almost as soon as it enters the mixing channel. In order to increase diffusion and reduce convection, according to the reference[1], curve transition can be used when solutions of different concentrations meet and enter. The direction of flow is no longer completely opposite, thus reducing convection.

Response Process

Background

In order to assist in verifying the feasibility of the current expression system, we established a simplified semi-quantitative model based on the reaction equations involved in the process. The model will be fed specific parameters to verify that the output is as expected. After verifying the accuracy of the model, we will explore the influence of changes in input substances on results in the process of expression on the basis of this model.

Method

Hypothesis

  1. The binding reaction and substance diffusion do not consume time, and the new substance reaches equilibrium in the bacteria instantly after it is produced.
  2. The protein yield per unit time is proportional to the plasmid concentration, and the amount of degradation is proportional to its own concentration.
  3. Due to the large number of binding equations involved in the system, there are a large number of binding constants to be determined. To simplify the model, we used the resistance equation to describe the reaction in combination with the equation so that the concentration of the product can be positively related to the concentration of the reactant, and the concentration of the product does not exceed any concentration of the reactant.
  4. We do not care about the absolute quantity of each substance, only about its relative change after a change in the input. So we're going to omit units for all of our numbers.

Fomula

In the whole response process, three types of reactions are mainly involved: intersubstance binding (such as protein-protein binding, protein-plasmid binding), substance production/degradation, and enzymatic reaction. So these reactions are described by the following equation.

Combination equation

As mentioned in the hypothesis, in order to simplify the model, we replaced the original binding equation (1) with the resistance equation (2).

fomula4

Generation/degradation equation

Equation (3) is the discrete form of the generation/degradation equation, and Equation (4) is the corresponding differential form. Where C stands for protein. When A represents plasmid/the same protein, and Kb > 0/Kb is < 0, this equation determines the production/degradation of the protein per unit time.

fomula5

Enzymatic equation

Equation (5) is the discrete form of the enzymatic equation, and equation (6) is the corresponding differential form. Where C represents the product, A represents the reaction complex (which may be formed by combining multiple reactants), and E represents the enzyme.

fomula6

Model of response process

Following figure is a schematic diagram of our system. The serial numbers of each reaction in the system correspond to the tables.


model of response process.
Fig 2. The model of response process.

Table 1. Combination equations.
a b c
1 Sp10 Antibody Sp10-Antibody
2 Sp10-Antibody Affibody1-PmrB Sp10-Antibody-affibody1-PmrB(SAAP)
3 EGFR Affibody2-nisin EGFR-Affibody2-nisin(EAN)
4 EAN NisK EAN-NisK (EANN)
5 PmrA-Pi P1 PmrA-Pi-P1
6 NisR-Pi P2 NisR-Pi-P2
7 IPTG P1 IPTG-P1
8 IPTG P2 IPTG-P2
9 Cro P3 Cro-P3
10 cI P3 cI-P3


Table 2. Generation equations.
a c Kb(>0)
11 IPTG-P1 PmrA Kpmra
12 IPTG-P1 Affibody1-PmrB Kaffpb
13 IPTG-P2 NisK Knisk
14 IPTG-P2 NisR Knisr
15 PmrA-Pi-P1 Cro Kcro
16 NisR-Pi-P2 Ser Intergrase Kser
17 P3 without Cro cI Kci
18 P3 without cI and P3’ RFP Krfp
19 P3’ without cI GFP Kgfp


Table 3. Enzymatic equations.
a c e kcat Km
20 PmrA PmrA-Pi SAAP kcatsaap Kmsaap
21 NisR NisR-Pi EANN kcateann Kmeann
22 P3 P3’ Ser Intergrase kcatinter Kminter


Table 4. Degradation equation.
a/c Kb(<0)
PmrA Kipmra
Affibody1-PmrB Kiaffpb
NisK Kinisk
NisR Kinisr
PmrA-Pi Kipmrap
NisR-Pi Kinisrp
Cro Kicro
Ser Intergrase Kiser
cI Kici
RFP Kirfp
GFP Kigfp
SP10 Kisp10
Antibody Kianti
EGFR Kiegfr
Affibody2-nisin Kiaffnis

On the basis of the above equations, we will use the software MATLAB R2021b to simulate the model.

Results

Here are the results of testing the model with specific inputs (for example, changes in EGFR). When the concentration of EGFR decreased, the time for the expression of GFP to reach saturation was prolonged, and RFP was also expressed in the early stage. In the absence of EGFR, only RFP was expressed.


Expression changes of GFP and RFP under different concentrations of EGFR
Fig 3. Expression changes of GFP and RFP under different concentrations of EGFR

In addition, we also predict the results of SP10 and IPTG after the input quantity changes. Through different modes of action, both substances lead to down-regulation of GFP expression after their input is reduced. Therefore, before our products are put into use, it is necessary to check whether the input amount of IPTG in different batches is consistent, so as to avoid false negative results.


Expression changes of GFP and RFP under different concentrations of Sp10.
Fig 4. Expression changes of GFP and RFP under different concentrations of Sp10.



Expression changes of GFP and RFP under different concentrations of IPTG.
Fig 5. Expression changes of GFP and RFP under different concentrations of IPTG.

Conclusion

We developed a semi-quantitative expression model by reducing the response process to a combination of chemical equations. This model is in good agreement with our experimental expectations, and can predict some experimental phenomena with strong stability and reliability. (such as RFP expression in the early stage when EGFR concentration is low). Although this model is only applicable to the current project, the idea of building a simplified mathematical model can be applied to most projects.

Promoter Transformer

Background

Promoter is one of the core elements required by synthetic biology. It is very important to construct promoters with different strengths. This task can be accomplished through experimental methods, but the experimental methods are time-consuming and labor-intensive, making it difficult to perform large-scale screening. In the software part, we design the Promoter_Transformer to predict promoter strength from sequences. Here, we will give a more detailed explanation of the model building itself. This model is based on the Transformer structure, extracts sequence information through the Encoder in the Transformer, and then predicts the promoter strength through the fully connected layer.

Methods

Based on the Transformer's structure, we construct a model that can be used to predict promoter strength, called Promoter_Transformer. The overall structure of Promoter_Transformer can be seen in Fig. 1.

The overall structure of Promoter_Transformer
Fig 6. The overall structure of Promoter_Transformer

The data we use for training are promoter sequences and their corresponding promoter strengths. Now, we will introduce the running process of the model step by step.

  1. For a promoter sequence, first, we should make it a sentence through k-mer segment.
    For example, if we have a promoter sequence AATCGGT, the corresponding sentence after 3-mer segment is AAT ATC TCG CGG GGT. In the sentence, each 3-mer is regarded as a word.
  2. After getting the promoter sentences, we put them into the embedding layer.
    Specifically, for a promoter sentence AAT ATC TCG CGG GGT, each word will first have a one-hot encoding. For instance, AAT is [1,0,0,0,0], ATC is [0,1,0,0,0], TCG is [0,0,1,0,0]... However, when the word number is large, the one-hot vectors will become sparse, which is not suitable for information extracting. Hence, we should transform the one-hot vectors to hidden vectors. For a one-hot vector (x1, x2, ..., xm), we want to use it to form a hidden embedding vector (y1, y2, ..., yn). Then

    fomula7

    Where wij is the weight from xi to yj.
    Note that the weights in this layer will be changed through the training process by gradient descent method. After getting the gradient of w∇, then w will be changed to w-γ·∇, where γ is the learning rate.

    fomula8

  3. When we get the hidden embedding vectors, we can combine the embedding vector with some other information.
    For example, we can add the positional encoding as follows.

    fomula9

    We can also use other methods to extract more information, for instance, the traditional natural vector as follows.

    The traditional natural vector method
    Fig 7. The traditional natural vector method

  4. Then the vectors will be input to the transformer encoder layer.
    In each encoder, the data first passes through a Multi-Head Attention layer and then through a Feed Forward layer. An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The mechanism of the special attention "Scaled Dot-Product Attention" used in the Transformer model can be seen in Fig. 2.

    fomula10

    Additionally, there is a residual connection around each of the two sub-layers, followed by layer normalization.
    The mechanism of the special attention ;Scaled Dot-Product Attention; used in Transformer
    Fig 8. The mechanism of the special attention "Scaled Dot-Product Attention" used in Transformer

  5. Then the vectors will go through the flatten layer.
    The transformer encoder layer will return two-dimensional vectors, it is not suitable for the model to get a predicted number. Therefore, we can use the flatten layer.
    For a two-dimensional vector (a matrix) [[1,2,3],[4,5,6],[7,8,9]], it will be flattened to [1,2,3,4,5,6,7,8,9].
  6. Finally, the vectors can go through the linear layer and get the predicted strength.
    The dropout method is used here. It means that the flattened vector will be randomly set to zero. When we set p = 0.5, every element in the vector has a fifty percent chance to become 0. This is a good way to prevent overfitting, which can help improve the performance of your model. Finally, for a flattened vector (x1, x2, ..., xm), we want to use it to form a final predicted strength y. Then

    fomula11

    Where wi is the weight from xi to y.
Table 5. Important parameters
Parameters value
Embedding dimension 200
Transformer d_model 200
Transformer n_head 10
Transformer num_layers 12
Dropout p value 0.5
Optimizer SGD
Learning rate 0.0001

The loss function: Mean Square Error

fomula12

Results

During model building, we use the Pearson correlation coefficient to assess the correlation between predicted and actual results. We use the same dataset as Wang et al. and perform similar splits. Wang et al. mainly use the Convolutional Neural Network (CNN) model for prediction, and the result reaches 0.25. In our experiments, we also try multiple model architectures and obtain correlations around 0.14 in the Recurrent Neural Network (RNN) model architecture and around 0.22 in the Long Short-Term Memory (LSTM) model architecture. Ultimately, we refine the Transformer-based framework to adapt it to our biological problem, and finally achieve a correlation above 0.32. This shows that our prediction results have reached an advanced level in this evaluation standard.

Conclusion

In general, here we introduce the model itself of Promoter_Transformer and introduce the process of data running in the model. This model also achieve very good performance in predicting promoter strength.

Reference

[1] Tirella, A., Marano, M., Vozzi, F., & Ahluwalia, A. (2008). A microfluidic gradient maker for toxicity testing of bupivacaine and lidocaine. Toxicology in Vitro, 22(8), 1957-1964. doi:10.1016/j.tiv.2008.09.016