CropFold relies on toehold switch-based sensing for the detection of plant pathogens. We designed novel toehold switches to control the expression of a reporter protein in our cell-free systems. The purpose of the toehold switch is to change its secondary structure in the presence of a trigger sequence, which in our project is a conserved sequence in the target pathogen’s genome. Because our toehold design algorithms produced multiple possible toehold switches, we needed a model to narrow the selection down to the most optimal designs.



Aims


Toehold switches rely on their secondary structure for their function. In their inactive form, the ribosome binding site (RBS) and start codon are sequestered in a stable stem-loop. When the trigger binds to the toehold switch, the lower part of the stem-loop unwinds, leaving only a weak stem intact, which can be opened when a ribosome binds to the RBS, starting the translation (Green et al, 2017). The toehold switch must be designed to fold into this predetermined structure. In our modeling, we estimate their deviations from the optimal structure (Fig. 1) to gauge their performance in silico.

Two toehold structures. Binding sites, RBS and AUG are indicated with different colors in the figure.

Figure 1.
The optimal structure for the toehold sensors. (A) The optimal structure for the A-series toehold. (B) The optimal structure for the B-series toehold. The structures are from Pardee et al (2016).


We developed this model for the analysis of A- and B-series toehold switches from Pardee et al (2016), which are designed to reduce the translational leakage compared to older toehold switch structures. Using this model, one can quickly identify the most promising toehold designs and test their performance in the lab.



Results


We decided to use the most optimal three-parameter fit from Ma et al (2018) to rank our designs. The fit has a fairly strong correlation to their performance (R2 = 0,57). This fit assigns a score for each toehold switch based on three parameters calculated based on their deviations from their ideal structure. This fit was originally designed to select B-series toeholds, but we expanded it to A-series toeholds by assigning the parameters based on their definitions.

Based on the results provided by our model, we selected the three highest ranking A- and B-series toeholds designed to detect sequences from the barley yellow dwarf virus genome in our lab tests. When we decided to use these toehold switches, our design and modeling algorithms were not yet optimized, so the designs we picked were not optimal. We also modified our algorithms to use the 21-nt linker sequence instead of the 30-nt linker sequence present in our original toehold switches. We learned a lot during our partnership with TrigGate and adjusted our algorithms accordingly. We later assigned updated scores for our original toehold designs and alternative designs. These scores ignore the last 9 nucleotides of the linker sequence, as it’s the updated version of the model. The scores are displayed in Table 1.


Table 1. The scores of our toehold switches and their alternative designs, provided by TrigGate.
Toehold identifier Score Toehold identifier Score
ABOA A70 16.40528 ABOA B39 13.18878
ABOA A92 15.39586 ABOA B46 14.08091
ABOA A95 15.39047 ABOA B69 11.98966
TrigGate A70 6.19759 TrigGate B39 24.0725
TrigGate A92 -1.00635 TrigGate B46 28.13376
TrigGate A95 0.201508 TrigGate B69 18.89491

We sought to test the model’s prediction capabilities in our lab by comparing the performances of these different toehold switches and their scores. We saw a clear correlation between the ratio of endpoint signal production in the presence and absence of the trigger (ON/OFF ratio) and the score assigned by our model. However, we did not get enough experimental data to draw credible conclusions. Therefore, we can only conclude the model’s viability for predicting the performance of B-series toeholds as was demonstrated before (Ma et al, 2018). We still suggest that this model is viable for A-series toeholds as well. The correlation between the ON/OFF ratio and the score is shown in Figure 2.

A graph containing a linear line.

Figure 2.
Correlation of predicted and measured performance of toehold switches. The graph shows the ON/OFF ratio, measured from two independent reactions, over the score predicted by our model.

This model was also used to assign scores for each of the toehold switches we created for our toehold switch library. After we updated our design algorithm, we designated scores for our newly produced toehold switches for detection of the barley yellow dwarf virus. The scores for the best A-series toehold switches were 21.8 and 21.8. The best B-series toeholds scored 23.1 and 22.3. These results suggest that our improved design algorithm was improved from the first iteration. Scores for all toehold switches in our library are presented below and they are also visible on their pages in the registry (parts BBa_K4207002-BBa_K4207060).

Table 2. The scores of the rest of the toehold switches in our library. These toehold switches were created with the newest iteration of the design algorithm and the score was assigned with our model.
Toehold identifier Score Toehold identifier Score
WDV A1 27.65 WDV B1 26.62
WDV A2 27.54 WDV B2 25.56
CGMMV A1 26.94 CGMMV B1 22.45
CGMMV A2 26.36 CGMMV B2 21.43
TBRFV A1 33.03 TBRFV B1 27.52
TBRFV A2 32.54 TBRFV B2 27.12
PMMV A1 31.18 PMMV B1 26.59
PMMV A2 30.83 PMMV B2 25.50
PPV A1 29.40 PPV B1 29.46
PPV A2 29.02 PPV B2 28.62
SDV A1 29.00 SDV B1 32.43
SDV A2 29.91 SDV B2 30.23
SPLCV A1 31.02 SPLCV B1 23.02
SPLCV A2 30.41 SPLCV B2 22.67
TCV A1 21.21 TCV B1 21.72
PMV A1 21.09 TCV B2 20.87
PMV A2 20.81 PMV B1 24.38
TMV A1 30.79 PMV B2 24.36
TMV A2 29.47 TMV B1 27.90
PVY A1 33.15 TMV B2 26.00
PVY A2 32.72 PVY B1 31.44
BYDV A1 21.77 PVY B2 28.64
BYDV A2 21.41 BYDV B1 23.10
BYDV B2 22.28


Methods


We evaluated our toehold designs based on a three parameter-fit from Ma et al (2018). Here are the parameters used and their explanations:

Parameter Explanation
dsensor Refers to the number of incorrectly oriented nucleotides in the whole inactive structure
dactive sensor Refers to the number of incorrectly oriented nucleotides in the active structure, starting from the first nucleotide downstream of the binding site
dbinding site Refers to the number of incorrectly oriented nucleotides in the first 25 nucleotides of the binding site; this is the part of the binding site not forming the base of the stem-loop

This three-parameter fit was originally developed for B-series toeholds, but we adopted it to function for the A-series toeholds as well. The regions of the toehold switches corresponding to each value are highlighted in Figure 3.


Two toehold structures. d<sub>sensor</sub>, d<sub>active sensor</sub> and d<sub>binding site</sub> are indicated with different colors in the figure.
Figure 3. The regions of the toehold switches used in estimating their folding. (A) The subsequences are highlighted in the optimal A-series structure (B) and in the optimal B-series structure. In both structures, the teal color highlights the subsequence used for calculating dbinding site and the lilac color highlights the sequence used for calculating dactive sensor. The dactive sensor is evaluated from the structure of the whole sensor.


All of these parameters are calculated using the NUPACK 4.0.0.27 program. They are calculated for each region using the defect function, which evaluates the normalized complex ensemble defect for the sequence of interest. It uses a pairwise probability matrix to estimate the average amount of incorrectly paired nucleotides in the design, relative to the defined structure (Dirks et al, 2007; Fornache et al, 2020).

Each toehold was assigned a score based on the calculated parameters based on the following formula by Ma et al (2018):

Score = 54.3-71 dsensor-49.1 dactive sensor-22.6 dbinding site

The deviations from the optimal structure hinder the toehold switches performance, which is why they are assigned negative signs. Using this model, the toeholds which fold into a secondary structure most closely resembling the optimal structure are given the highest score and should be considered the best ones.



Assumptions and restrictions


Although this model is useful in narrowing down the list of potential toehold switches to test out in the lab, it can’t predict the best designs with absolute certainty. The three-parameter fit used in our model has an R2 value of 0.57, so it can only predict 57 % of the variation in the performance.

Because we are analyzing only the sensor sequence, the model assumes no interactions of molecules present in the reaction. We used the ‘defect’ function of the NUPACK python module, which estimates the normalized complex defect and takes advantage of the complex analysis functionalities of the module as opposed to the tube analysis functionalities (Dirks et al, 2007; Fornache et al, 2020). We also ignore the impact of the reporter protein coding sequence on the folding of the toehold switch.

This means that the model doesn’t take into account all the concentration and crosstalk effects between different strand species present in the reaction. Therefore, the model may not estimate the viabilities of the toehold switches correctly if the reaction contains other molecules that significantly impact the folding of the ssRNA sequence.

The score assigned for each toehold is based on its deviation from the ideal structure, which can vary based on the desired properties of the toehold switch. For instance, A-series and B-series toehold switches have different properties, as B-series switches exhibit lower translational leakage but A-series switches produce more signal when activated. This stems from the fact that B-series switches don’t contain the stabilizing refolding domain in the active structure found in the A-series switches and so the equilibrium favors the inactive state more in B-series toehold switches compared to A-series. Because different ideal structures cause different properties, the scores between toeholds of different design principles can’t be reliably compared.

Despite all of these factors, this model is a useful tool when used to narrow down potential toehold switches, but a few top candidates should still be tested in the lab to determine the most optimal toehold design.



Using the model


This model can be used in NUPACK to evaluate the performance of toeholds designed with any algorithm. We have integrated it into our design algorithm so we can readily see the best of our designs.

To access the model, one first has to install NUPACK 4.0.0.27 on their computer (see our NUPACK guide). The user also has to install the desired version of the model from our GitLab page. We created ready modeling programs for the A- and B-series toeholds from Pardee et al (2016), but this code is easily modifiable for different target structures. Then the user should have a CSV file containing their designed sequences in the following format: identifier, sensor sequence, trigger sequence. The identifier and trigger sequence are not necessary for the function of the program, but they are useful for downstream data handling as all the necessary information is in the same place. There is no limit on the size of the input file, apart from the power of the user’s computer. Then the user should change the CSV file name (currently sequences.csv) to match their file name. The user should also change the model of the program to match the conditions their toehold switches would operate in.

Now the user is ready to run the program by executing all cells. The program outputs an excel file where the sequences are ranked by their score in descending order.



References


  • Dirks, R. M., Bois, J. S., Schaeffer, J. M., Winfree, E., & Pierce, N. A. (2007). Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev, 49, 65-88.

  • Green, A. A., Kim, J., Ma, D., Silver, P. A., Collins, J. J., & Yin, P. (2017). Complex cellular logic computation using ribocomputing devices. Nature, 548(7665), 117–121. https://doi.org/10.1038/nature23271

  • Green, A., Silver, P., Collins, J., & Yin, P. (2014). Toehold Switches: De-Novo-Designed Regulators of Gene Expression. Cell, 159(4), 925–939. https://doi.org/10.1016/j.cell.2014.10.002

  • Fornace, M. E., Porubsky, N. J., & Pierce, N. A. (2020). A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed. ACS Synthetic Biology, 9(10), 2665–2678. https://doi.org/10.1021/acssynbio.9b00523

  • Ma, D., Shen, L., Wu, K., Diehnelt, C. W., & Green, A. A. (2018). Low-cost detection of norovirus using paper-based cell-free systems and synbody-based viral enrichment. Synthetic Biology, 3(1). https://doi.org/10.1093/synbio/ysy018

  • Pardee, K., Green, A. A., Takahashi, M. K., Braff, D., Lambert, G., Lee, J. W., Ferrante, T., Ma, D., Donghia, N., Fan, M., Daringer, N. M., Bosch, I., Dudley, D. M., O’Connor, D. H., Gehrke, L., & Collins, J. J. (2016). Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components. Cell, 165(5), 1255–1266. https://doi.org/10.1016/j.cell.2016.04.059