Our diagnostic tool is based on the SHERLOCK (1) system, which itself relies on the Cas13a enzymatic reaction. Cas13a is an enzyme that possesses both a specific and unspecific nuclease activity (2). In other words, Cas13a cleaves target RNA sequences that are complementary to a guide RNA, and, other non-specific and off-target RNA sequences. We are using that property of Cas13a coupled with a variety of RNA probes in order to detect the presence of certain pathogens in Thau’s lagoon water.
Our project was split in two different tests as explained in other sections:
Our diagnostic is intended to be easy to use and rapid, making it qualitative. However in order to characterize our system more deeply we relied on modeling approaches. We thus developed a kinetic reaction model based on ordinary differential equations (ODEs) that can inform on the molecular details of the process and reaction timescales.
The SHERLOCK detection system can bring qualitative information on the presence or absence of a pathogen. The fluorescence-based test allows access to kinetics of the reaction. With our model we hope to exploit this information to bring deeper understanding on the reaction and develop a more quantitative aspect to the detection:
What we measure in the SHERLOCK assay is a fluorescence signal corresponding to the specific trans-cleavage activity of Cas13 protein mediated by crRNA after reaction with target RNA. We establish ordinary differential equations (ODEs) to describe the evolution of the fluorescence over time. We inform the model with our experimental data to infer the kinetic parameters of the molecular reactions involved.
For our system the system is described in fig with the guide-RNA-Cas complex (GC), the uncut target sequence (Tu), the guide-RNA-Cas-target complex (GCT), the inactive probe (Pi), the active probe (Pa) and the cut target sequence (Tc).
Note that we follow the sequential model developed by E.A. Nalefski et al. for the reaction mechanism of Cas13a (3). The enzyme first binds its guide (GC). We neglect the formation of the Cas-Guide complex, assuming that it occurs faster than the other reactions.The “active” complex binds then to the target RNA with a rate constant k1, forming the Guide-Cas-Target complex (GCT); we make the hypothesis that the inverse reaction (GCT → GC + Tu) with rate constant k2 can be neglected. The GCT association initiates the collateral activity (probe cleavage and activation) with rate constant k3, which creates the cutted probes (Pa). Meanwhile, the enzyme also cuts the target at a much lower rate k4 (3) creating cut targets (Tc).
To summarize, these are the main hypotheses made to develop and solve our mathematical model:
These first assumptions could be relaxed and more parameters could be included. However, for the sake of
simplicity, we opted for a simplified approach to avoid overfitting.
In the definition of the system of differential equations we assume a mass action interaction, typical of well-mixed systems.
With these prescriptions, we wrote a set of Ordinary Differential Equations (ODEs) describing the evolution of the molecular complexes in time (see below). For instance, the concentration of active probes Pa (proportional to the fluorescence signal) increases over time proportional to the product between the concentrations of GCT and Pi, with a rate constant k3.
We solved the system of ODEs numerically by means of scipy solve_ivp function (5). We used the method DOP853 as described in the book by E. Hairer et al. Solving Ordinary Differential Equations I: Nonstiff Problems (5). To do so, we need to implement a variety of parameters and initial concentrations of the various species. We set the various initial concentrations using our wet-lab protocol. The parameters initial values are set using the literature on SHERLOCK on Cas13a reaction:
Name | Value | Deduction |
---|---|---|
[GC0] | 0.4µM | Wet-lab protocol (1) |
[GCT0] | 0µM | Wet-lab protocol (1) |
[Pa0] | 0µM | Wet-lab protocol (1) |
[Pi0] | 4.6µM | Wet-lab protocol (1) |
[Tc0] | 0µM | Wet-lab protocol (1) |
[Tu0] | 0.001µM | Wet-lab protocol (1) |
k1 | 1 M-1min-1 | Using iGEM munich 2017 model |
k2 | 0.15 M-1min-1 | Deduced from experimental data |
k3 | 0.01 min-1 | Article by (3) |
We then fitted the parameters with the data obtained in our plate reader experiments, following the procedure below.
We implemented the equation in a python script available here. We define the system of ODEs with initial guesses for the parameters. Then an ODE solver is called from the package scipy (5). In order to fit the data we defined a least-square error function and used a minimization algorithm from scipy (5) to obtain the least error on the fit. We will use the script to fit the experimental data gathered from the plate reader. The parameters will then be estimated from the fit.
The first thing we want to look at after fitting the data is the solution of the system. In figure 1 we plotted the evolution of the concentration of the different species in the model over time. This was used as a verification experiment for the fitting.
The model was used to fit the wet-lab data. The first sequence we tried to model was the synthetic sequence published by Kellner et al. (1), used by us as a Positive Control. We used these experiments in order to validate our model and begin inferring parameters. Results of the fit for the synthetic sequence can be found in figure 3. To provides a measure of how well the model predicts the observed outcomes, we computed the coefficient of determination R2 between the model and the data. This measure relates to the part of the variance of the data explained by the model. An R2=1 indicates that the predictions perfectly fit the data.
We can see that the model seems to predict the experimental data. With this first result we decided to move to modeling all our experimental data. As explained in our results section, we tested a variety of sequences corresponding to pathogens or oyster genes. Our model depends on three parameters, two rate constants and one rate. Using our fitting and these parameters we wanted to study the mechanistic of our reaction. To do so we first looked at mutated sequences. We have created constructions of a target (ToxR) with an increasing number of mutations (from 1 to 10) to test for specificity. Using our model we wanted to study the impact of mutation on the kinetic parameters.
We fitted individual experimental replicates and computed an average, then plotted the distribution of each parameter. In order to represent the data, we introduce a new metric, the Hamming distance (6). This distance relates to the difference between the guide sequence and the target which corresponds to the numbers of mutated nucleotides. The results can be found in figure 4.
We can see in panel A that as the Hamming distance increases there is a decrease in the binding reaction rate whereas as shown in panel B and C the other constants seem to stay around the same value. This result suggests that the binding step is the limiting step in the Shell’lock reaction and that regardless of the sequence composition the kinetic properties are conserved. Following these results, we wanted to try to look at different sequences at 0 Hamming distance.
To try different sequences at 0 hamming distance, we used the pathogenic targets we created. We followed the same procedure and fitted all the data. Results can be found in figure 5.
When the Hamming distance is equal to 0, meaning a perfect match between the guide RNA and the target sequence, we observe that the binding rate constant k1 is approximately of the order 0.1 M-1 min-1 while the unspecific nuclease rate constant k3 is more variable (0.1-10 M-1 min-1).The specific nuclease rate k4 is 0.01 min-1. We chose to represent the median line of the data to account for inaccurate fitting of the data that were still kept in the analysis. An example of inaccurate fitting can be seen in panel C for example as outliers in the parameters. This first set of data seems to suggest that, with a perfect guide-target match (Hamming distance equal to zero), the binding rate constant k1 is roughly of the same order of magnitude for all the sequences tested. However, as we showed previously, this parameter strongly depends on the Hamming distance from the designed cognate sequence. Across the sequences tested we remark a larger variability of the probe nuclease rate constant k3, of at least two orders of magnitude, and a substantially constant target nuclease rate constant k4 .
After our analysis on the rate constants and as a verification experiment we wanted to convert rate constants to reaction rate in order to confront our model with literature values. In order to convert rate constant into reaction rate we need to multiply the extracted value from the fit by the concentration on which it depends. Our two reaction rate being k1and k3, which respectively depend on : [Tuc] and [GCT]. The resulting plot for the various sequences tested can be found in figure 6.
We can see on panel A the distribution of the binding rate of Cas13a. We see in panel B the unspecific nuclease rate ranges from 0.4 to over 1 sec-1. Nalefski et al. (3) showed that for a probe similar to ours (5 Uridines) the unspecific nuclease reaction rate was in the order of 3.3 sec-1. Our analysis shows similar order of magnitude for the sequences we tried. Noting that the approach of Nalefski et al. consisted in steady state measurement and Michaelis Menten approximations.
In this section we showed the new model we developed to study the enzymatic reaction of SHERLOCK detection. Our
model is based on kinetic modeling and agrees with experimental data. Informed by data, we showed that we can
use this model to infer rate constants to study more in depth the molecular mechanisms of the reactions
involved.
We first quantified how an imperfect match between guide and targets limits the efficiency of the reaction (figure 4). In particular, only a few mismatches impact the guide/target binding affinity influencing the entire process upstream and providing specificity to the nuclease reaction. . The design and test of more mutant sequences will be necessary to support this conclusion and could inform how different mutations (of nucleotide-type, position, or number) influence the process.
Furthermore, we showed conserved trends in kinetic parameters for different guide target couples (figure 5). We could infer the kinetic constants for these designed sequences, and the rates of production and degradation of the molecular species (figure 6) are consistent with catalytic rates published in the literature
(3).
Further experiments would need to be carried out in order to confirm the trend in the data. We may want to relax some of the model hypotheses and consider the backward rate k2, in particular to study the role of mismatches in the sequence (as we expect this parameter to become more and more relevant as the Hamming distance increases). More refined fitting methods could also be considered to provide error bars and reliability scores of the fit.
Finally, our long term vision contemplates the development of strategies to contain the spreading of an infection during the production process. This has not been done for obvious time constraints, but we are thinking of developing ecological/epidemiological modeling based on the typical compartmental SIR model. We would like to build a spatial ecological model that, informed by time series of on-field data obtained with Shell’lock at different locations, could provide containment strategies of isolation of small areas of the farm, without the need to lockdown the entire production line. The model could also inform about the best timeline for testing.
1.M. J. Kellner, J. Koob, J. S. Gootenberg, O. O. Abudayyeh, et F. Zhang, « SHERLOCK: Nucleic acid detection with CRISPR nucleases », Nat Protoc, vol. 14, no 10, p. 2986‑3012, oct. 2019, doi: 10.1038/s41596-019-0210-2.
2. S. Shmakov et al., « Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems », Molecular Cell, vol. 60, no 3, p. 385‑397, nov. 2015, doi: 10.1016/j.molcel.2015.10.008.
3. E. A. Nalefski et al., « Kinetic analysis of Cas12a and Cas13a RNA-Guided nucleases for development of improved CRISPR-Based diagnostics », iScience, vol. 24, no 9, p. 102996, sept. 2021, doi: 10.1016/j.isci.2021.102996.
4. V. Mekler, L. Minakhin, E. Semenova, K. Kuznedelov, et K. Severinov, « Kinetics of the CRISPR-Cas9 effector complex assembly and the role of 3′-terminal segment of guide RNA », Nucleic Acids Research, vol. 44, no 6, p. 2837‑2845, avr. 2016, doi: 10.1093/nar/gkw138.
5. P. Virtanen et al., « SciPy 1.0: fundamental algorithms for scientific computing in Python », Nat Methods, vol. 17, no 3, Art. no 3, mars 2020, doi: 10.1038/s41592-019-0686-2.
6. R. W. Hamming, « Error detecting and error correcting codes », The Bell System Technical Journal, vol. 29, no 2, p. 147‑160, avr. 1950, doi: 10.1002/j.1538-7305.1950.tb00463.x.