Gradient Marker
Background
Our hardware needs to establish chemokine concentration gradients that allow sperm to move in a directional manner along this gradient. In the preliminary idea, in order to make the liquid fully mixed and the concentration in the pipeline uniform, we set up a multi-curved mixing channel to achieve the purpose. However, in the case of limited chip size, we need to further optimize the volume of this part. Therefore, we modeled this part and discussed the formation mechanism of concentration gradient by means of fluid motion coupled with solute diffusion.
Method
Hypothesis
- The channel in the chip has little variation in the thickness direction, so the dimension of the thickness direction is ignored and the plane model of the chip is established.
- Consider the flow and concentration under steady-state conditions, that is, the flow and concentration in the channel do not change with time.
- The fluid property is incompressible, and the volume force is ignored.
Fomula
Navier-stokes equations for steady-state incompressible conditions:
Where ρ is the fluid density, u is the velocity vector, p is the pressure, I is the identity matrix, K represents the viscous term in the equation, and μ is the dynamic viscosity coefficient. In the calculation, either u or p (note: in this model, the flow velocity is positively related to pressure, so pressure is usually used in the calculation instead of velocity) will be given as boundary conditions, and ρ and μ will depend on the type of fluid. This equation is the basis for numerical calculation of the flow model.
The above model explains the influence of velocity (convection) on the formation of concentration gradient. In the vertical direction of velocity, there is also a diffusion phenomenon caused by the solute itself. The influence of this part on concentration needs to be explained by Fick's second law of diffusion:
Where D is the diffusion coefficient, c is the concentration, t is the time, and x is the distance. In addition, we can also calculate the diffusion coefficient by linear regression of another form of this equation(C is a constant):
Based on the above equations, we will use COMSOL Multiphysics 5.4 to simulate the model. The solvent is water, so ρ and μ value of the water parameters( ρ = 1 g/cm3, μ = 1 * 10-3 Pa * s). The pipe diameter is d = 0.1 mm
Results
As can be seen from the following figures, under steady-state flow, the faster the flow velocity in the channel (or the larger the pressure difference between the inlet and outlet), the weaker the diffusion ability of the solute, and the easier the concentration gradient is to form. Therefore, our model is not limited to a certain solute, and any solute can find a suitable inlet and outlet pressure difference through our model, thus forming a concentration gradient. Our model therefore has good generality. At the same time, the simulation results are very similar to the experimental results, which further verifies the reliability and stability of our model.
Conclusion
In practical application of the gradient marker model to construct the concentration gradient, theoretically we can adjust the pressure difference at the inlet and outlet to form a stable concentration gradient for solutes with different diffusion coefficients. In addition, convection is the main factor affecting the formation of concentration gradient in our model, and the solution is completely mixed by convection almost as soon as it enters the mixing channel. In order to increase diffusion and reduce convection, according to the reference[1], curve transition can be used when solutions of different concentrations meet and enter. The direction of flow is no longer completely opposite, thus reducing convection.
Response Process
Background
In order to assist in verifying the feasibility of the current expression system, we established a simplified semi-quantitative model based on the reaction equations involved in the process. The model will be fed specific parameters to verify that the output is as expected. After verifying the accuracy of the model, we will explore the influence of changes in input substances on results in the process of expression on the basis of this model.
Method
Hypothesis
- The binding reaction and substance diffusion do not consume time, and the new substance reaches equilibrium in the bacteria instantly after it is produced.
- The protein yield per unit time is proportional to the plasmid concentration, and the amount of degradation is proportional to its own concentration.
- Due to the large number of binding equations involved in the system, there are a large number of binding constants to be determined. To simplify the model, we used the resistance equation to describe the reaction in combination with the equation so that the concentration of the product can be positively related to the concentration of the reactant, and the concentration of the product does not exceed any concentration of the reactant.
- We do not care about the absolute quantity of each substance, only about its relative change after a change in the input. So we're going to omit units for all of our numbers.
Fomula
In the whole response process, three types of reactions are mainly involved: intersubstance binding (such as protein-protein binding, protein-plasmid binding), substance production/degradation, and enzymatic reaction. So these reactions are described by the following equation.
Combination equation
As mentioned in the hypothesis, in order to simplify the model, we replaced the original binding equation (1) with the resistance equation (2).
Generation/degradation equation
Equation (3) is the discrete form of the generation/degradation equation, and Equation (4) is the corresponding differential form. Where C stands for protein. When A represents plasmid/the same protein, and Kb > 0/Kb is < 0, this equation determines the production/degradation of the protein per unit time.
Enzymatic equation
Equation (5) is the discrete form of the enzymatic equation, and equation (6) is the corresponding differential form. Where C represents the product, A represents the reaction complex (which may be formed by combining multiple reactants), and E represents the enzyme.
Model of response process
Following figure is a schematic diagram of our system. The serial numbers of each reaction in the system correspond to the tables.
a | b | c | |
---|---|---|---|
1 | Sp10 | Antibody | Sp10-Antibody |
2 | Sp10-Antibody | Affibody1-PmrB | Sp10-Antibody-affibody1-PmrB(SAAP) |
3 | EGFR | Affibody2-nisin | EGFR-Affibody2-nisin(EAN) |
4 | EAN | NisK | EAN-NisK (EANN) |
5 | PmrA-Pi | P1 | PmrA-Pi-P1 |
6 | NisR-Pi | P2 | NisR-Pi-P2 |
7 | IPTG | P1 | IPTG-P1 |
8 | IPTG | P2 | IPTG-P2 |
9 | Cro | P3 | Cro-P3 |
10 | cI | P3 | cI-P3 |
a | c | Kb(>0) | |
---|---|---|---|
11 | IPTG-P1 | PmrA | Kpmra |
12 | IPTG-P1 | Affibody1-PmrB | Kaffpb |
13 | IPTG-P2 | NisK | Knisk |
14 | IPTG-P2 | NisR | Knisr |
15 | PmrA-Pi-P1 | Cro | Kcro |
16 | NisR-Pi-P2 | Ser Intergrase | Kser |
17 | P3 without Cro | cI | Kci |
18 | P3 without cI and P3’ | RFP | Krfp |
19 | P3’ without cI | GFP | Kgfp |
a | c | e | kcat | Km | |
---|---|---|---|---|---|
20 | PmrA | PmrA-Pi | SAAP | kcatsaap | Kmsaap |
21 | NisR | NisR-Pi | EANN | kcateann | Kmeann |
22 | P3 | P3’ | Ser Intergrase | kcatinter | Kminter |
a/c | Kb(<0) |
---|---|
PmrA | Kipmra |
Affibody1-PmrB | Kiaffpb |
NisK | Kinisk |
NisR | Kinisr |
PmrA-Pi | Kipmrap |
NisR-Pi | Kinisrp |
Cro | Kicro |
Ser Intergrase | Kiser |
cI | Kici |
RFP | Kirfp |
GFP | Kigfp |
SP10 | Kisp10 |
Antibody | Kianti |
EGFR | Kiegfr |
Affibody2-nisin | Kiaffnis |
On the basis of the above equations, we will use the software MATLAB R2021b to simulate the model.
Results
Here are the results of testing the model with specific inputs (for example, changes in EGFR). When the concentration of EGFR decreased, the time for the expression of GFP to reach saturation was prolonged, and RFP was also expressed in the early stage. In the absence of EGFR, only RFP was expressed.
In addition, we also predict the results of SP10 and IPTG after the input quantity changes. Through different modes of action, both substances lead to down-regulation of GFP expression after their input is reduced. Therefore, before our products are put into use, it is necessary to check whether the input amount of IPTG in different batches is consistent, so as to avoid false negative results.
Conclusion
We developed a semi-quantitative expression model by reducing the response process to a combination of chemical equations. This model is in good agreement with our experimental expectations, and can predict some experimental phenomena with strong stability and reliability. (such as RFP expression in the early stage when EGFR concentration is low). Although this model is only applicable to the current project, the idea of building a simplified mathematical model can be applied to most projects.
Promoter Transformer
Background
Promoter is one of the core elements required by synthetic biology. It is very important to construct promoters with different strengths. This task can be accomplished through experimental methods, but the experimental methods are time-consuming and labor-intensive, making it difficult to perform large-scale screening. In the software part, we design the Promoter_Transformer to predict promoter strength from sequences. Here, we will give a more detailed explanation of the model building itself. This model is based on the Transformer structure, extracts sequence information through the Encoder in the Transformer, and then predicts the promoter strength through the fully connected layer.
Methods
Based on the Transformer's structure, we construct a model that can be used to predict promoter strength, called Promoter_Transformer. The overall structure of Promoter_Transformer can be seen in Fig. 1.
The data we use for training are promoter sequences and their corresponding promoter strengths. Now, we will introduce the running process of the model step by step.
- For a promoter sequence, first, we should make it a sentence through k-mer segment.
For example, if we have a promoter sequence AATCGGT, the corresponding sentence after 3-mer segment is AAT ATC TCG CGG GGT. In the sentence, each 3-mer is regarded as a word. - After getting the promoter sentences, we put them into the embedding layer.
Specifically, for a promoter sentence AAT ATC TCG CGG GGT, each word will first have a one-hot encoding. For instance, AAT is [1,0,0,0,0], ATC is [0,1,0,0,0], TCG is [0,0,1,0,0]... However, when the word number is large, the one-hot vectors will become sparse, which is not suitable for information extracting. Hence, we should transform the one-hot vectors to hidden vectors. For a one-hot vector (x1, x2, ..., xm), we want to use it to form a hidden embedding vector (y1, y2, ..., yn). Then
Note that the weights in this layer will be changed through the training process by gradient descent method. After getting the gradient of w∇, then w will be changed to w-γ·∇, where γ is the learning rate.
- When we get the hidden embedding vectors, we can combine the embedding vector with some other information.
For example, we can add the positional encoding as follows.
Fig 7. The traditional natural vector method
-
Then the vectors will be input to the transformer encoder layer.
In each encoder, the data first passes through a Multi-Head Attention layer and then through a Feed Forward layer. An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The mechanism of the special attention "Scaled Dot-Product Attention" used in the Transformer model can be seen in Fig. 2.
Fig 8. The mechanism of the special attention "Scaled Dot-Product Attention" used in Transformer
-
Then the vectors will go through the flatten layer.
The transformer encoder layer will return two-dimensional vectors, it is not suitable for the model to get a predicted number. Therefore, we can use the flatten layer.
For a two-dimensional vector (a matrix) [[1,2,3],[4,5,6],[7,8,9]], it will be flattened to [1,2,3,4,5,6,7,8,9]. -
Finally, the vectors can go through the linear layer and get the predicted strength.
The dropout method is used here. It means that the flattened vector will be randomly set to zero. When we set p = 0.5, every element in the vector has a fifty percent chance to become 0. This is a good way to prevent overfitting, which can help improve the performance of your model. Finally, for a flattened vector (x1, x2, ..., xm), we want to use it to form a final predicted strength y. Then
Parameters | value |
---|---|
Embedding dimension | 200 |
Transformer d_model | 200 |
Transformer n_head | 10 |
Transformer num_layers | 12 |
Dropout p value | 0.5 |
Optimizer | SGD |
Learning rate | 0.0001 |
The loss function: Mean Square Error
Results
During model building, we use the Pearson correlation coefficient to assess the correlation between predicted and actual results. We use the same dataset as Wang et al. and perform similar splits. Wang et al. mainly use the Convolutional Neural Network (CNN) model for prediction, and the result reaches 0.25. In our experiments, we also try multiple model architectures and obtain correlations around 0.14 in the Recurrent Neural Network (RNN) model architecture and around 0.22 in the Long Short-Term Memory (LSTM) model architecture. Ultimately, we refine the Transformer-based framework to adapt it to our biological problem, and finally achieve a correlation above 0.32. This shows that our prediction results have reached an advanced level in this evaluation standard.
Conclusion
In general, here we introduce the model itself of Promoter_Transformer and introduce the process of data running in the model. This model also achieve very good performance in predicting promoter strength.
Reference
[1] Tirella, A., Marano, M., Vozzi, F., & Ahluwalia, A. (2008). A microfluidic gradient maker for toxicity testing of bupivacaine and lidocaine. Toxicology in Vitro, 22(8), 1957-1964. doi:10.1016/j.tiv.2008.09.016