Our diagnostic tool is based on the SHERLOCK (1) system, which itself relies on the Cas13a enzymatic reaction. Cas13a is an enzyme that possesses both a specific and unspecific nuclease activity (2). In other words, Cas13a cleaves target RNA sequences that are complementary to a guide RNA, and, other non-specific and off-target RNA sequences. We are using that property of Cas13a coupled with a variety of RNA probes in order to detect the presence of certain pathogens in Thau’s lagoon water.

Our project was split in two different tests as explained in other sections:

  • An in-vitro test used for the characterisation of the reaction. In this, we use a fluorescent probe linked by RNA to a quencher (see more details in the project section).
  • A field test directly usable by oyster farmers. This test relies on the use of antibodies and RNA probes linked to molecules recognized by antibodies in a lateral flow assay.

Our diagnostic is intended to be easy to use and rapid, making it qualitative. However in order to characterize our system more deeply we relied on modeling approaches. We thus developed a kinetic reaction model based on ordinary differential equations (ODEs) that can inform on the molecular details of the process and reaction timescales.

Informations brought by the model


The SHERLOCK detection system can bring qualitative information on the presence or absence of a pathogen. The fluorescence-based test allows access to kinetics of the reaction. With our model we hope to exploit this information to bring deeper understanding on the reaction and develop a more quantitative aspect to the detection:

  • Our kinetic model of the reaction is based on the law of mass action. This allows us to relate chemical reaction rate to the concentration of the various species present in the system. We will use this model in order to compare various reactions with notably different target sequences. Because we are measuring fluorescence the quantification cannot be extended to the limits of the reactions i.e. only conditions tested in the same experiments can be compared. Using kinetic parameters (mainly the catalytic rate of Cas13a) we will no longer be limited by bulk fluorescence measurement and will have a universal comparison quantity.
  • In the future we would like to develop a compartmental spatial SIR (Susceptible, Infectious, Recovered) model to bring an ecological dimension complementary to the test. The model will be used to describe the spread of the infection if ever the pathogen is detected by the test. Modeling the spread of an infection will allow to propose potential localized and precise containment solutions to the famers i.e where should the test be performed to get the best results and where to try to block the spread. Moreover such an approach can lead us to estimate suitable sampling intervals for the most efficient detection.

Kinetic model


What we measure in the SHERLOCK assay is a fluorescence signal corresponding to the specific trans-cleavage activity of Cas13 protein mediated by crRNA after reaction with target RNA. We establish ordinary differential equations (ODEs) to describe the evolution of the fluorescence over time. We inform the model with our experimental data to infer the kinetic parameters of the molecular reactions involved.

System of equations for our system

For our system the system is described in fig with the guide-RNA-Cas complex (GC), the uncut target sequence (Tu), the guide-RNA-Cas-target complex (GCT), the inactive probe (Pi), the active probe (Pa) and the cut target sequence (Tc).

Figure 1: Schematic of the reactions modeled with our kinetic model.

Note that we follow the sequential model developed by E.A. Nalefski et al. for the reaction mechanism of Cas13a (3). The enzyme first binds its guide (GC). We neglect the formation of the Cas-Guide complex, assuming that it occurs faster than the other reactions.The “active” complex binds then to the target RNA with a rate constant k1, forming the Guide-Cas-Target complex (GCT); we make the hypothesis that the inverse reaction (GCT → GC + Tu) with rate constant k2 can be neglected. The GCT association initiates the collateral activity (probe cleavage and activation) with rate constant k3, which creates the cutted probes (Pa). Meanwhile, the enzyme also cuts the target at a much lower rate k4 (3) creating cut targets (Tc).



To summarize, these are the main hypotheses made to develop and solve our mathematical model:

  • We neglect the binding and unbinding between the Cas (C) and the guide (G). We assume that the reaction is specific enough to be fast(4)
  • We neglect the backward step GCT → GC + Tu as the guide-target are cognate sequences and the dissociation rate is believed to be small. For this reason we fix k2=0 and this parameter will no longer be discussed.

These first assumptions could be relaxed and more parameters could be included. However, for the sake of simplicity, we opted for a simplified approach to avoid overfitting.
In the definition of the system of differential equations we assume a mass action interaction, typical of well-mixed systems.

Resulting equations

With these prescriptions, we wrote a set of Ordinary Differential Equations (ODEs) describing the evolution of the molecular complexes in time (see below). For instance, the concentration of active probes Pa (proportional to the fluorescence signal) increases over time proportional to the product between the concentrations of GCT and Pi, with a rate constant k3.

Initial parameters

We solved the system of ODEs numerically by means of scipy solve_ivp function (5). We used the method DOP853 as described in the book by E. Hairer et al. Solving Ordinary Differential Equations I: Nonstiff Problems (5). To do so, we need to implement a variety of parameters and initial concentrations of the various species. We set the various initial concentrations using our wet-lab protocol. The parameters initial values are set using the literature on SHERLOCK on Cas13a reaction:

Name Value Deduction
[GC0] 0.4µM Wet-lab protocol (1)
[GCT0] 0µM Wet-lab protocol (1)
[Pa0] 0µM Wet-lab protocol (1)
[Pi0] 4.6µM Wet-lab protocol (1)
[Tc0] 0µM Wet-lab protocol (1)
[Tu0] 0.001µM Wet-lab protocol (1)
k1 1 M-1min-1 Using iGEM munich 2017 model
k2 0.15 M-1min-1 Deduced from experimental data
k3 0.01 min-1 Article by (3)
Table 1: list of initial parameters used in the model

We then fitted the parameters with the data obtained in our plate reader experiments, following the procedure below.

Implementation

We implemented the equation in a python script available here. We define the system of ODEs with initial guesses for the parameters. Then an ODE solver is called from the package scipy (5). In order to fit the data we defined a least-square error function and used a minimization algorithm from scipy (5) to obtain the least error on the fit. We will use the script to fit the experimental data gathered from the plate reader. The parameters will then be estimated from the fit.

Results


Verification of the results

The first thing we want to look at after fitting the data is the solution of the system. In figure 1 we plotted the evolution of the concentration of the different species in the model over time. This was used as a verification experiment for the fitting.

Fig 2: Evolution of the concentration of all the species in the model. Parameters : see table 1

Proof of concept

The model was used to fit the wet-lab data. The first sequence we tried to model was the synthetic sequence published by Kellner et al. (1), used by us as a Positive Control. We used these experiments in order to validate our model and begin inferring parameters. Results of the fit for the synthetic sequence can be found in figure 3. To provides a measure of how well the model predicts the observed outcomes, we computed the coefficient of determination R2 between the model and the data. This measure relates to the part of the variance of the data explained by the model. An R2=1 indicates that the predictions perfectly fit the data.

Figure 3: The kinetic model agrees with experimental Shell’lock data. In orange is an average curve for triplicates of synthetic sequence(1). In blue the curve predicted by the model for the evolution of probe concentration.

We can see that the model seems to predict the experimental data. With this first result we decided to move to modeling all our experimental data. As explained in our results section, we tested a variety of sequences corresponding to pathogens or oyster genes. Our model depends on three parameters, two rate constants and one rate. Using our fitting and these parameters we wanted to study the mechanistic of our reaction. To do so we first looked at mutated sequences. We have created constructions of a target (ToxR) with an increasing number of mutations (from 1 to 10) to test for specificity. Using our model we wanted to study the impact of mutation on the kinetic parameters.

Looking at kinetic parameters for various mutated sequences

We fitted individual experimental replicates and computed an average, then plotted the distribution of each parameter. In order to represent the data, we introduce a new metric, the Hamming distance (6). This distance relates to the difference between the guide sequence and the target which corresponds to the numbers of mutated nucleotides. The results can be found in figure 4.

Figure 4:Kinetic parameters for the ToxR mutants as a function of the Hamming distance from the cognate sequence.
  • A. Binding rate constant k1 as a function of the Hamming distance. The experimental points are represented in gray circles, the average in black circles; averages are connected by dashed blue lines.
  • B. Unspecific nuclease reaction rate k3 as in panel A.
  • C. Specific nuclease rate k4 as in panel A..

We can see in panel A that as the Hamming distance increases there is a decrease in the binding reaction rate whereas as shown in panel B and C the other constants seem to stay around the same value. This result suggests that the binding step is the limiting step in the Shell’lock reaction and that regardless of the sequence composition the kinetic properties are conserved. Following these results, we wanted to try to look at different sequences at 0 Hamming distance.

Looking at the distribution of kinetic parameters for multiple sequences

To try different sequences at 0 hamming distance, we used the pathogenic targets we created. We followed the same procedure and fitted all the data. Results can be found in figure 5.

Figure 5: Estimated kinetic parameters for pathogenic sequences.
  • A. Target binding rate constant k1 for the different sequences tested. The experimental points are represented in gray circles, the averages in red crosses and the median of all the data points is represented as a dashed orange line.
  • B. Unspecific nuclease reaction rate constant k3 as in panel A.
  • C. Specific nuclease rate k4as in panel A.

When the Hamming distance is equal to 0, meaning a perfect match between the guide RNA and the target sequence, we observe that the binding rate constant k1 is approximately of the order 0.1 M-1 min-1 while the unspecific nuclease rate constant k3 is more variable (0.1-10 M-1 min-1).The specific nuclease rate k4 is 0.01 min-1. We chose to represent the median line of the data to account for inaccurate fitting of the data that were still kept in the analysis. An example of inaccurate fitting can be seen in panel C for example as outliers in the parameters. This first set of data seems to suggest that, with a perfect guide-target match (Hamming distance equal to zero), the binding rate constant k1 is roughly of the same order of magnitude for all the sequences tested. However, as we showed previously, this parameter strongly depends on the Hamming distance from the designed cognate sequence. Across the sequences tested we remark a larger variability of the probe nuclease rate constant k3, of at least two orders of magnitude, and a substantially constant target nuclease rate constant k4 .

Extracting rates from rate constants and verification of literature values

After our analysis on the rate constants and as a verification experiment we wanted to convert rate constants to reaction rate in order to confront our model with literature values. In order to convert rate constant into reaction rate we need to multiply the extracted value from the fit by the concentration on which it depends. Our two reaction rate being k1and k3, which respectively depend on : [Tuc] and [GCT]. The resulting plot for the various sequences tested can be found in figure 6.

Figure 6: Estimated reaction rates for pathogenic sequences
  • A. Target binding reaction rate k1[Tuc] (sec-1) for the different sequences tested. The name of the sequence tested is indicated on the legend of the plot.
  • B. Unspecific nuclease reaction rate k3[GCT] (sec-1) for the different sequences tested.

We can see on panel A the distribution of the binding rate of Cas13a. We see in panel B the unspecific nuclease rate ranges from 0.4 to over 1 sec-1. Nalefski et al. (3) showed that for a probe similar to ours (5 Uridines) the unspecific nuclease reaction rate was in the order of 3.3 sec-1. Our analysis shows similar order of magnitude for the sequences we tried. Noting that the approach of Nalefski et al. consisted in steady state measurement and Michaelis Menten approximations.

Conclusion


In this section we showed the new model we developed to study the enzymatic reaction of SHERLOCK detection. Our model is based on kinetic modeling and agrees with experimental data. Informed by data, we showed that we can use this model to infer rate constants to study more in depth the molecular mechanisms of the reactions involved.

We first quantified how an imperfect match between guide and targets limits the efficiency of the reaction (figure 4). In particular, only a few mismatches impact the guide/target binding affinity influencing the entire process upstream and providing specificity to the nuclease reaction. . The design and test of more mutant sequences will be necessary to support this conclusion and could inform how different mutations (of nucleotide-type, position, or number) influence the process.

Furthermore, we showed conserved trends in kinetic parameters for different guide target couples (figure 5). We could infer the kinetic constants for these designed sequences, and the rates of production and degradation of the molecular species (figure 6) are consistent with catalytic rates published in the literature (3).

Further experiments would need to be carried out in order to confirm the trend in the data. We may want to relax some of the model hypotheses and consider the backward rate k2, in particular to study the role of mismatches in the sequence (as we expect this parameter to become more and more relevant as the Hamming distance increases). More refined fitting methods could also be considered to provide error bars and reliability scores of the fit.

Finally, our long term vision contemplates the development of strategies to contain the spreading of an infection during the production process. This has not been done for obvious time constraints, but we are thinking of developing ecological/epidemiological modeling based on the typical compartmental SIR model. We would like to build a spatial ecological model that, informed by time series of on-field data obtained with Shell’lock at different locations, could provide containment strategies of isolation of small areas of the farm, without the need to lockdown the entire production line. The model could also inform about the best timeline for testing.

References

1.M. J. Kellner, J. Koob, J. S. Gootenberg, O. O. Abudayyeh, et F. Zhang, « SHERLOCK: Nucleic acid detection with CRISPR nucleases », Nat Protoc, vol. 14, no 10, p. 2986‑3012, oct. 2019, doi: 10.1038/s41596-019-0210-2.
2. S. Shmakov et al., « Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems », Molecular Cell, vol. 60, no 3, p. 385‑397, nov. 2015, doi: 10.1016/j.molcel.2015.10.008.
3. E. A. Nalefski et al., « Kinetic analysis of Cas12a and Cas13a RNA-Guided nucleases for development of improved CRISPR-Based diagnostics », iScience, vol. 24, no 9, p. 102996, sept. 2021, doi: 10.1016/j.isci.2021.102996.
4. V. Mekler, L. Minakhin, E. Semenova, K. Kuznedelov, et K. Severinov, « Kinetics of the CRISPR-Cas9 effector complex assembly and the role of 3′-terminal segment of guide RNA », Nucleic Acids Research, vol. 44, no 6, p. 2837‑2845, avr. 2016, doi: 10.1093/nar/gkw138.
5. P. Virtanen et al., « SciPy 1.0: fundamental algorithms for scientific computing in Python », Nat Methods, vol. 17, no 3, Art. no 3, mars 2020, doi: 10.1038/s41592-019-0686-2.
6. R. W. Hamming, « Error detecting and error correcting codes », The Bell System Technical Journal, vol. 29, no 2, p. 147‑160, avr. 1950, doi: 10.1002/j.1538-7305.1950.tb00463.x.