Background
In the 2022 BIT project, we decided to identify the pain type of patients as neuropathic pain (Np) or nociceptive pain (No) by detecting the concentration of two miRNAs in pain patients and combining the classification model of machine learning. In the whole project, how to select appropriate markers from a large number of miRNAs for classification is a problem that needs to be solved first. In addition, in the whole miRNA detection process, because the concentration of miRNAs is extremely low, in addition to the signal amplification method (HCR), miRNAs also need to be processed by CRISPR to improve the specificity of detection, The CRISPR process applied to miRNAs also lacks a suitable model to describe the reaction process and guide experimental optimization. Because the field of pain & miRNA is innovative, there is almost no research on the correlation between pain and miRNA, so we need to make more use of existing models to develop our own model, and use it to guide our experimental assessment model.
At the beginning of the project, we decided to select appropriate miRNAs as the detection biomarker based on the literature related to pain and miRNA, so we further optimized the miRNA screening model from the literature screening process. After selecting miRNAs, we need to design the optimal HCR primer and optimize CRISPR experimental conditions. We used NUPACK nucleic acid meter analysis software to optimize the sequence and concentration ratio of the primer chain designed in the HCR reaction. After specific values are substituted into the software, the optimal values of each parameter are obtained by controlling variables. When we use experiments for comparison, if the experiment results agree well, we can additionally consider using this model to guide the optimization of experimental parameters, and supplement some extreme situation data that cannot be achieved by actual experiments through the model. The modeling work focuses on describing the CRISPR process and establishing an evaluation model by using differential equations and the associated enzyme kinetics model. Based on the existing Michaelis equation evaluation model in the literature and combining differential equations to describe the CRISPR process, the CRISPR differential equation model can provide optimization guidance for the experiment, and the reaction rate that affects the model can be obtained by solving the equation, According to the variables that affect the reaction rate, a series of optimal experimental guidance can be derived. In addition, the modeling also established a classification model, which can judge the pain type of patients through the miRNA concentration input.
miRNA Model
——Screening and Classification
Screening
Since literature and works that directly figure out the relationship between miRNAs and pain are relatively avant-garde, and there’s a lack of data and experiments to support our extensive miRNA screening, that is, we cannot screen pain-related miRNAs from all known miRNA populations, but based on the work of Dayer et al[1], We decided to inherit from their completed screening process and develop last two steps of our own to obtain the best two types of miRNAs for us, and apply them to the pain classification model. In the classification model, a machine learning classification algorithm is used. The concentrations of two miRNAs are put into the model respectively, and results are output to determine the type of pain of patients. Finally, we use our self-designed algorithm to give the final type of pain: No/Np;
Dayer et al. targeted 184 miRNAs with low CQ numbers and characterized the potential detection limit of various miRNAs with the CQ value in the PCR process. After screening these miRNAs, 12 kinds of miRNAs that can classify pain were obtained. Based on these 12 miRNAs and 119 detectable miRNAs in the blood[2], we used a variety of machine learning algorithms to verify and screen. The accuracy of classification using the binary regression model in the original text can reach 70%.
Taking different concentrations of each miRNA as classification thresholds into the ROC model to calculate its classification sensitivity and specificity, we can get the ROC curve varying with the threshold concentration, and the area under the ROC curve (AUC) can characterize the ability of miRNA as a classification marker.
The AUC value range is 0 ≤ AUC ≤ 1. When the AUC is greater than 0.5, the closer A is to 1, the higher the diagnostic accuracy; When AUC=0.5, the diagnosis is completely ineffective; AUC<0.5 is not in line with the actual situation. Generally, 0.5<AUC ≤ 0.7 indicates a low diagnostic value, 0.7<AUC ≤ 0.9 indicates a medium diagnostic value, and AUC>0.9 indicates a high diagnostic value.
Finally, based on our screening model, we determined hsa-miR-98-5p and hsa-miR-7d-5p as biomarkers. The following pain classification models based on hsa-miR-98-5p and hsa-miR-7d-5p are constructed through the decision tree:
Classification
It can be seen from the literature that since the classification accuracy of the binary regression model can only reach 70%, we have considered other classification models. Because the decision tree can classify under the premise of unknown data characteristics, we have chosen the decision tree. Therefore, combined with the data provided by the literature, we have reconsidered the number of input signals of the model, and separately considered single, two, and three miRNA input models for classification, It was comprehensively judged that the optimal miRNA was used as a pain classification marker. The classification results showed that the classification effect of the multi-miRNA input model was not significantly better than that of single miRNA input. In addition, considering that hsa-miR-98-5p and hsa-miR-7d-5p were ranked in the top two in terms of importance during screening, and the single miRNA input model had a high uncertainty, we chose to input the two miRNAs separately to get the results. The final classification result is obtained by combining the reliability (importance) of the two markers.
The basic principles of the decision tree: by establishing an attribute tree, each branch of the decision tree can represent a feature of the data. The decision tree model increases the classification basis by continuously adding branches. When the decision tree has enough branches, the training data can be completely classified. When dealing with unknown data, the trained decision tree model will judge the attribute division of input data from the initial data node, and finally divide it into leaf nodes and get the classification results.
Parameter name | Value |
---|---|
Training time | 0.003s |
Data segmentation | 0.7 |
Data shuffle | Yes |
Cross validation | No |
Node splitting evaluation criteria | gini |
Feature division point selection standard | best |
The maximum feature scale considered in the partition | None |
Minimum number of samples split by internal node | 2 |
Minimum sample number of leaf node | 1 |
Minimum weight of samples in leaf node | 0 |
Maximum number of leaf nodes | 3 |
Maximum depth of tree | 3 |
Threshold value of node division impurity | 0 |
Table 1. Decision tree model obtained by training literature data. The advantage of the decision tree model is that it does not need to predict any characteristics of the data, but summarizes the characteristics of the data through the training model. As the classification basis of the model, the trained decision tree model will judge the characteristics of the input data and give the classification result .
After the construction of the decision tree model, we need to evaluate the model. For supervised machine learning, sensitivity and specificity are classic indicators for evaluating the classification criteria, while for the decision tree model, we can show the classification effect of the model in the form of a confusion matrix heatmap (Figure 5); In addition, by comparing the classification performance of training set and test set, combined with the model parameters obtained (the deepest is only 3 layers), it can be seen that about 70% of the accuracy rate of the model implementation is mainly due to insufficient data volume, resulting in insufficient feature acquisition
Accuracy | recall | precision | F1 | |
---|---|---|---|---|
Training set | 0.743 | 0.743 | 0.758 | 0.724 |
Test set | 0.767 | 0.767 | 0.768 | 0.759 |
Table 2. The above table shows the classification evaluation indicators of the training set and test set, and measures the classification effect of the decision tree on training and test data through quantitative indicators; a. Accuracy: predict the proportion of correct samples in the total samples. The greater the accuracy, the better; b. Recall rate: among the results that are actually positive samples, the predicted proportion is positive samples. The larger the recall rate, the better;c. Accuracy rate: the predicted results are positive samples, and the actual results are positive samples. The greater the accuracy rate, the better; d. F1: The harmonic average of the accuracy rate and the recall rate. The accuracy rate and the recall rate affect each other. Although both are high, it is an expected ideal situation. However, in practice, the accuracy rate is often high and the recall rate is low, or the recall rate is low but the accuracy rate is high. If you need to consider both, you can use the F1 indicator.
Results and Discussions
Through the construction of a screening and classification model, we found the most suitable miRNAs for pain classification. Our project took two miRNAs (hsa-miR-98-5p and hsa-miR-7d-5p) obtained from the screening model as biomarkers. The two types of miRNAs will be used as the input of the HCR+CRISPR reaction system, with the goal of blood miRNA detection. We collect and process blood in the detection system equipped with hardware, and use the HCR+CRISPR system to detect the fluorescence intensity of two miRNAs, convert the fluorescence intensity into the concentration value of miRNAs, and put it into the classification model to judge the patient's pain type.
NUPACK Model
——Design and Analyze
Purpose
In order to ensure the smooth progress of the HCR reaction, we used the NUPACK webpage to guide the sequence design of the probe before the experiment, so as to reduce the number of trials and the experiment cost and promote the experiment.
Design
NUPACK is a nucleic acid analysis and design software designed and developed by the University of California, which mainly has two functions: analysis and design. It can be run locally by downloading the source code, or online on the NUPACK website. The NUPACK webpage has a powerful nucleic acid sequence analysis function: after presetting the nucleic acid type and reaction temperature of the analyte, by placing the base sequence of each reactant (such as the probe, miRNA), and setting the concentration of each reactant and the maximum number of holding chains, press the analysis button, and you can get the type of all reaction products under the optimal thermodynamic conditions. Click on each product to observe the sequence structure, minimum free energy, base pairing heat map, etc.
In the HCR experiment, the base type, stem-loop length, base mismatch, and other factors of the H1 sequence will all affect the degree of reaction and binding. Therefore, before the formal experiment, we put the different H1 sequences designed into the NUPACK webpage for simulation to guide the following experiment.
Guide the experiment
The structure of the H1 sequence is one of the important factors affecting the HCR experiment, so we use the NUPACK web page to optimize and improve the H1 sequence, which is embodied in three aspects: H1 base sequence, probe base mismatch, and H1 stem-loop length.
[Base sequence]
Since the H1 probe reaction effect was poor at the beginning of the experiment, we optimized the base sequence of H1 and evaluated the optimization results with NUPACK.
H1 |
---|
Original H1:AGTCTAGGAAACTGCGTGGGTTAAAACAATACAACTTACTACCTCATTAACCCA |
Optimized H1:ATCAGACTGAAAGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAAC |
It can be seen from the NUPACK web simulation that the system with optimized base sequence has stronger reaction amplification and higher concentration of target product. At the same time, there are fewer false positives compared to the old system. We applied the optimized H1 sequence to subsequent HCR experiments and designed experiments to verify the correctness of this selection.
From (2)(3) and (5)(6), it can be seen that the corresponding bands of the optimized H1 sequence are darker in color and longer in length, so the reaction is easier to carry out and the degree of reaction is higher. Due to the experimental operation and the low content of false-positive products, the reduction of false positives in the new system is not significantly reflected in the left panel, but we can still conclude that the optimized H1 is more conducive to the HCR reaction, which proves the correctness of the NUPACK-guided experiment.
[Probe base pair mismatch]
In our references, the probes used in HCR experiments are designed with base mismatches, so we also designed base mismatches for each probe sequence, and used NUPACK to simulate the reaction after each probe mismatch to see if the base mismatch is conducive to HCR experiments. The base mismatch sequence is designed as follows:
H1 base mismatch sequence |
---|
Original H1:ATCAGACGAAATGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAAC |
[1]ATCAGACGAAATGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAAG |
[2]ATCAGACGAAATGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAAA |
[3]ATCAGACGAAATGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAAT |
H2 base mismatch sequence |
---|
Original H2: ACTTTGTCAACATCTTTCAGTCTGATCAGTGTATCAGACTGAAAGATGTTGA |
[1]ACTTTGTCAACATCATTTCGTCTGATCAGTGTATCAGACGAAATGATGTTGT |
[2]ACTTTGTCAACATCATTTCGTCTGATCAGTGTATCAGACGAAATGATGTTGG |
[3]ACTTTGTCAACATCATTTCGTCTGATCAGTGTATCAGACGAAATGATGTTGC |
H3 base mismatch sequence |
---|
Original H3: ATCAGACTGAAAGATGTTGACAAAGTTCAACATCTTTCAGTCTGATACACTG |
[1]TTCAGACGAAATGATGTTGACAAAGTTCAACATCATTTCGTCTGATACACTG |
[2]GTCAGACGAAATGATGTTGACAAAGTTCAACATCATTTCGTCTGATACACTG |
[3]CTCAGACGAAATGATGTTGACAAAGTTCAACATCATTTCGTCTGATACACTG |
Due to the excessive mismatch of each probe, we hereby excerpt some of the NUPACK simulation results (H1[1], H2[2], H3[1]) for demonstration:
Based on the above NUPACK modeling results, we can conclude that base mismatch is not conducive to the HCR reaction, so in the biological experiments, we do not use base mismatch probes, which greatly reduces the number and cost of the experiments, effectively guiding the experiment.
[Stem-loop length]
In the previous preliminary modeling of NUPACK, we optimized the base sequence of H1, but the stem-loop length of H1 is also an important factor affecting the HCR experiment, so we continue to use NUPACK to build a model to optimize the stem-loop length of H1.
We designed H1 with different stem-loop lengths, namely:
H1 with different lengths |
---|
H1-98-7:ATCAGACTGAAAGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGT |
H1-98-9:ATCAGACTGAAAGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCA |
H1-98-11:ATCAGACTGAAAGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAAC |
H1-98-13:ATCAGACTGAAAGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAACA |
H1-98-15:ATCAGACTGAAAGATGTTGACAAAGTAACAATACAACTTACTACCTCAACTTTGTCAACA |
Put the above sequence on the NUPACK webpage for simulation, and the simulation result is as follows:
From figure 9 we can observe that when the stem-loop length is 11, the target product concentration is the highest and the system amplification is the strongest. The left column shows the concentration of the target product in the simulated system of each H1 stem-loop length. It can be observed that when the stem-loop length is 11, the target product concentration is the highest and the system amplification is the strongest. The right column shows the concentration of false-positive products in the simulated system of each H1 stem-loop length. It can be observed that when the stem-loop length is 11, the concentration of false-positive products is the lowest. Therefore, from the NUPACK simulation, we can preliminarily conclude that when the length of the H1 probe stem-loop is 11, the experiment system has the best reaction effect.
After reaching this conclusion in the NUPACK simulation, we began to conduct biological experiments.
Tips: NC = Negtive Control
From the bands, we can see that the slot under the H1-11 system has the lowest brightness and the longest overall band, which proves that the H1-11 system has the most complete reaction and the strongest amplification, which is consistent with the NUPACK modeling results.
Results and Discussions
The above is the probe optimization design modeling work we completed using NUPACK. Through NUPACK modeling work, we can conclude that the HCR reaction is best facilitated by using an H1 probe with a stem-loop length of 11 and without base mismatching of the probe.
Overall, due to the enormous impact of H1 probes on HCR experiments, NUPACK played a pivotal role in probe design and experiment guidance.
CRISPR Model
——Evaluation and Optimization
Introduction
Under the additional condition that the reporter gene concentration in CRISPR experiments is generally much lower than the Michaelis-Menten constant, the classical Michaelis-Menten equation is modified to obtain the analytical solution of the reaction rate as a function of time. Based on the analytical solution, we proposed three evaluation indexes to evaluate the self-consistency and correctness of CRISPR experiments and proposed a method to optimize the experimental conditions. In the end, we propose other applications of the model.
Notations
Symbols | Implications |
---|---|
E | Enzyme complexes activated by target molecules |
S | Substrate concentration |
C | Intermediate reactant |
P | Cleaved reporter molecule |
Positive reaction coefficient | |
Reverse reaction coefficient | |
Catalytic constant of enzyme | |
Amount of enzyme activated by target molecule | |
Initial amount of substrate | |
t | A quantity that depends on time |
Duration of linear increase of fluorescence with time | |
Theoretical maximum value of reaction rate | |
the Michaelis-Menten constant of enzyme |
Model building
Model Derivation
In the CRISPR system adopted in this project, crRNA can specifically recognize the amplified sequence of target miRNAs and then activate the side-chain cleavage activity of the Cas12a protein. The Cas12a protein can specifically recognize and cleave fluorescent probes, thus realizing specific signal detection and transduction.
Hypothesis
- [H1] The initial enzyme concentration
- [H2] According to the quasi-steady-state hypothesis, the concentration of the intermediate almost does not change.
- [H3] In the classical Michaelis-Menten dynamical system
- [H4] In most CRISPR experiments [S]<<
Building
The trans-cleavage process can be modeled as above.
Where E represents the enzyme complex activated by the target molecule, S represents the uncleaved reporter gene, C represents the intermediate reactant, and P represents the cleaved reporter gene.
To establish a system of ordinary differential equations:
Initial value condition:
Based on the conditions of conservation of enzyme and substrate, the following constraints hold:
From assumption H2, we can write:
Combined with hypothesis H3, equations (4), (9), and (11) are simultaneously solved, and the product generation rate is controlled by the Michaelis-Menten equation:
By combining hypothesis H4 and Equation (10), the equation is improved:
Solving this differential equation, we can get:
Where
Obviously, τ decreases with increasing
Parameter calculation
In Equation (14), there are three parameters,
For the same enzyme, multiple sets of fluorescence curves were obtained by changing the concentration of the reporter gene. The initial linear part of
Equivalently, equation (12) is transformed into:
For the concentration and initial velocity of multiple reporter genes, the formula was used to carry out the least square method to fit, that is, to obtain the
Results and Discussions
Experiment Evaluation
Three indicators are proposed to evaluate the self-consistency and accuracy of experimental data:
a) The amount of product produced during the initial linear portion of
b) According to the Mie equation, the maximum rate of product generation is reached when [
c)
Rules 1 and 2 are required for all data, and an additional rule 3 is required for the reaction whose
Experiment improvement
According to Equation (14), the higher the value of
Experimental prediction
In Equation (14), t/τ is the horizontal axis, and
Since normalized variables were used in both horizontal and vertical axes, the curve (y=1-exp (-x)) would be satisfied for the product changes of any reaction. Therefore, the reaction time of each experiment could be predicted. For example, when t=τ, almost 63% of the reporter genes were involved in the reaction.
Conclusion
In order to solve the problem of markers for detection items, we established a screening model. Based on 119 miRNAs found in blood by predecessors and 12 of them clinically found to be associated with pain, we used a variety of validation models to prove that hsa-miR-98-5p and hsa-miR-7d-5p are our ideal pain markers. From the logic of marker detection classification, we also need to develop a classification model to transform the miRNA concentration values obtained from experiments. According to the current research status of miRNA&pain, we decided to use the decision tree model to train classification, and found that more clinical research investment is needed to promote data support.
In order to optimize our detection process, NUPACK should be used to guide the optimization of HCR reaction and mathematical model should be used to guide CRISPR optimization in biological part design. First of all, we used NUPACK to design the probe H1, H2, H3 base sequence in HCR reaction, and repeatedly used NUPACK to guide and optimize the structure and length of HCR process. For CRISPR process, we hope to provide a criterion for the success of the experiment with the help of mathematical models, develop a differential equation model from the algebraic equation model and form evaluation parameters, in which the process also provides us with reference values for key coefficients such as substrate concentration and enzyme concentration.
Dayer, Camille Florine, et al. "Differences in the miRNA signatures of chronic musculoskeletal pain patients from neuropathic or nociceptive origins." PloS one 14.7 (2019): e0219311.
Blondal, Thorarinn, et al. "Assessing sample and miRNA profile quality in serum and plasma or other biofluids." Methods 59.1 (2013): S1-S6.
Qin Liao, Zhifeng Hao, and Zhihong Chen. Data Mining and Mathematical Modeling. Guo fang gong ye chu ban she, 2010.
Nouri, Reza, et al. "Figure of Merit for CRISPR-Based Nucleic Acid-Sensing Systems: Improvement Strategies and Performance Comparison." ACS sensors 7.3 (2022): 900-911.
Nelson, David L., Albert L. Lehninger, and Michael M. Cox. Lehninger principles of biochemistry. Macmillan, 2008.
Ramachandran, Ashwin, and Juan G. Santiago. "CRISPR enzyme kinetics for molecular diagnostics." Analytical Chemistry 93.20 (2021): 7456-7464.
Chen, William W., Mario Niepel, and Peter K. Sorger. "Classic and contemporary approaches to modeling biochemical reactions." Genes & development 24.17