CropFold relies on toehold switch-based sensing for the detection of plant pathogens. We designed novel toehold switches to control the expression of a reporter protein in our cell-free systems. The purpose of the toehold switch is to change its secondary structure in the presence of a trigger sequence, which in our project is a conserved sequence in the target pathogen’s genome. Because our toehold design algorithms produced multiple possible toehold switches, we needed a model to narrow the selection down to the most optimal designs.

Aims

Toehold switches rely on their secondary structure for their function. In their inactive form, the ribosome binding site (RBS) and start codon are sequestered in a stable stem-loop. When the trigger binds to the toehold switch, the lower part of the stem-loop unwinds, leaving only a weak stem intact, which can be opened when a ribosome binds to the RBS, starting the translation (Green et al, 2017). The toehold switch must be designed to fold into this predetermined structure. In our modeling, we estimate their deviations from the optimal structure (Fig. 1) to gauge their performance in silico.

Two toehold structures. Binding sites, RBS and AUG are indicated with different colors in the figure.

Figure 1. The optimal structure for the toehold sensors. (A) The optimal structure for the A-series toehold. (B) The optimal structure for the B-series toehold. The structures are from Pardee et al (2016).

We developed this model for the analysis of A- and B-series toehold switches from Pardee et al (2016), which are designed to reduce the translational leakage compared to older toehold switch structures. Using this model, one can quickly identify the most promising toehold designs and test their performance in the lab.

Results

We decided to use the most optimal three-parameter fit from Ma et al (2018) to rank our designs. The fit has a fairly strong correlation to their performance (R² = 0,57). This fit assigns a score for each toehold switch based on three parameters calculated based on their deviations from their ideal structure. This fit was originally designed to select B-series toeholds, but we expanded it to A-series toeholds by assigning the parameters based on their definitions.

Based on the results provided by our model, we selected the three highest ranking A- and B-series toeholds designed to detect sequences from the barley yellow dwarf virus genome in our lab tests. When we decided to use these toehold switches, our design and modeling algorithms were not yet optimized, so the designs we picked were not optimal. We also modified our algorithms to use the 21-nt linker sequence instead of the 30-nt linker sequence present in our original toehold switches. We learned a lot during our partnership with TrigGate and adjusted our algorithms accordingly. We later assigned updated scores for our original toehold designs and alternative designs. These scores ignore the last 9 nucleotides of the linker sequence, as it’s the updated version of the model. The scores are displayed in Table 1.

**Table 1.** The scores of our toehold switches and their alternative designs, provided by TrigGate.
Toehold identifier	Score	Toehold identifier	Score
ABOA A70	16.40528	ABOA B39	13.18878
ABOA A92	15.39586	ABOA B46	14.08091
ABOA A95	15.39047	ABOA B69	11.98966
TrigGate A70	6.19759	TrigGate B39	24.0725
TrigGate A92	-1.00635	TrigGate B46	28.13376
TrigGate A95	0.201508	TrigGate B69	18.89491

We sought to test the model’s prediction capabilities in our lab by comparing the performances of these different toehold switches and their scores. We saw a clear correlation between the ratio of endpoint signal production in the presence and absence of the trigger (ON/OFF ratio) and the score assigned by our model. However, we did not get enough experimental data to draw credible conclusions. Therefore, we can only conclude the model’s viability for predicting the performance of B-series toeholds as was demonstrated before (Ma et al, 2018). We still suggest that this model is viable for A-series toeholds as well. The correlation between the ON/OFF ratio and the score is shown in Figure 2.

Figure 2. Correlation of predicted and measured performance of toehold switches. The graph shows the ON/OFF ratio, measured from two independent reactions, over the score predicted by our model.

This model was also used to assign scores for each of the toehold switches we created for our toehold switch library. After we updated our design algorithm, we designated scores for our newly produced toehold switches for detection of the barley yellow dwarf virus. The scores for the best A-series toehold switches were 21.8 and 21.8. The best B-series toeholds scored 23.1 and 22.3. These results suggest that our improved design algorithm was improved from the first iteration. Scores for all toehold switches in our library are presented below and they are also visible on their pages in the registry (parts BBa_K4207002-BBa_K4207060).

**Table 2.** The scores of the rest of the toehold switches in our library. These toehold switches were created with the newest iteration of the design algorithm and the score was assigned with our model.
Toehold identifier	Score	Toehold identifier	Score
WDV A1	27.65	WDV B1	26.62
WDV A2	27.54	WDV B2	25.56
CGMMV A1	26.94	CGMMV B1	22.45
CGMMV A2	26.36	CGMMV B2	21.43
TBRFV A1	33.03	TBRFV B1	27.52
TBRFV A2	32.54	TBRFV B2	27.12
PMMV A1	31.18	PMMV B1	26.59
PMMV A2	30.83	PMMV B2	25.50
PPV A1	29.40	PPV B1	29.46
PPV A2	29.02	PPV B2	28.62
SDV A1	29.00	SDV B1	32.43
SDV A2	29.91	SDV B2	30.23
SPLCV A1	31.02	SPLCV B1	23.02
SPLCV A2	30.41	SPLCV B2	22.67
TCV A1	21.21	TCV B1	21.72
PMV A1	21.09	TCV B2	20.87
PMV A2	20.81	PMV B1	24.38
TMV A1	30.79	PMV B2	24.36
TMV A2	29.47	TMV B1	27.90
PVY A1	33.15	TMV B2	26.00
PVY A2	32.72	PVY B1	31.44
BYDV A1	21.77	PVY B2	28.64
BYDV A2	21.41	BYDV B1	23.10
		BYDV B2	22.28

Methods

We evaluated our toehold designs based on a three parameter-fit from Ma et al (2018). Here are the parameters used and their explanations:

Parameter	Explanation
d_sensor	Refers to the number of incorrectly oriented nucleotides in the whole inactive structure
d_{active sensor}	Refers to the number of incorrectly oriented nucleotides in the active structure, starting from the first nucleotide downstream of the binding site
d_{binding site}	Refers to the number of incorrectly oriented nucleotides in the first 25 nucleotides of the binding site; this is the part of the binding site not forming the base of the stem-loop

This three-parameter fit was originally developed for B-series toeholds, but we adopted it to function for the A-series toeholds as well. The regions of the toehold switches corresponding to each value are highlighted in Figure 3.

Two toehold structures. d<sub>sensor</sub>, d<sub>active sensor</sub> and d<sub>binding site</sub> are indicated with different colors in the figure.

Figure 3. The regions of the toehold switches used in estimating their folding. (A) The subsequences are highlighted in the optimal A-series structure (B) and in the optimal B-series structure. In both structures, the teal color highlights the subsequence used for calculating d_{binding site} and the lilac color highlights the sequence used for calculating d_{active sensor}. The d_{active sensor} is evaluated from the structure of the whole sensor.

All of these parameters are calculated using the NUPACK 4.0.0.27 program. They are calculated for each region using the defect function, which evaluates the normalized complex ensemble defect for the sequence of interest. It uses a pairwise probability matrix to estimate the average amount of incorrectly paired nucleotides in the design, relative to the defined structure (Dirks et al, 2007; Fornache et al, 2020).

Each toehold was assigned a score based on the calculated parameters based on the following formula by Ma et al (2018):

Score = 54.3-71 d_sensor-49.1 d_{active sensor}-22.6 d_{binding site}

The deviations from the optimal structure hinder the toehold switches performance, which is why they are assigned negative signs. Using this model, the toeholds which fold into a secondary structure most closely resembling the optimal structure are given the highest score and should be considered the best ones.

Assumptions and restrictions

Although this model is useful in narrowing down the list of potential toehold switches to test out in the lab, it can’t predict the best designs with absolute certainty. The three-parameter fit used in our model has an R² value of 0.57, so it can only predict 57 % of the variation in the performance.

Because we are analyzing only the sensor sequence, the model assumes no interactions of molecules present in the reaction. We used the ‘defect’ function of the NUPACK python module, which estimates the normalized complex defect and takes advantage of the complex analysis functionalities of the module as opposed to the tube analysis functionalities (Dirks et al, 2007; Fornache et al, 2020). We also ignore the impact of the reporter protein coding sequence on the folding of the toehold switch.

This means that the model doesn’t take into account all the concentration and crosstalk effects between different strand species present in the reaction. Therefore, the model may not estimate the viabilities of the toehold switches correctly if the reaction contains other molecules that significantly impact the folding of the ssRNA sequence.

The score assigned for each toehold is based on its deviation from the ideal structure, which can vary based on the desired properties of the toehold switch. For instance, A-series and B-series toehold switches have different properties, as B-series switches exhibit lower translational leakage but A-series switches produce more signal when activated. This stems from the fact that B-series switches don’t contain the stabilizing refolding domain in the active structure found in the A-series switches and so the equilibrium favors the inactive state more in B-series toehold switches compared to A-series. Because different ideal structures cause different properties, the scores between toeholds of different design principles can’t be reliably compared.

Despite all of these factors, this model is a useful tool when used to narrow down potential toehold switches, but a few top candidates should still be tested in the lab to determine the most optimal toehold design.

Using the model

This model can be used in NUPACK to evaluate the performance of toeholds designed with any algorithm. We have integrated it into our design algorithm so we can readily see the best of our designs.

To access the model, one first has to install NUPACK 4.0.0.27 on their computer (see our NUPACK guide). The user also has to install the desired version of the model from our GitLab page. We created ready modeling programs for the A- and B-series toeholds from Pardee et al (2016), but this code is easily modifiable for different target structures. Then the user should have a CSV file containing their designed sequences in the following format: identifier, sensor sequence, trigger sequence. The identifier and trigger sequence are not necessary for the function of the program, but they are useful for downstream data handling as all the necessary information is in the same place. There is no limit on the size of the input file, apart from the power of the user’s computer. Then the user should change the CSV file name (currently sequences.csv) to match their file name. The user should also change the model of the program to match the conditions their toehold switches would operate in.

Now the user is ready to run the program by executing all cells. The program outputs an excel file where the sequences are ranked by their score in descending order.

References

Dirks, R. M., Bois, J. S., Schaeffer, J. M., Winfree, E., & Pierce, N. A. (2007). Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev, 49, 65-88.

Green, A. A., Kim, J., Ma, D., Silver, P. A., Collins, J. J., & Yin, P. (2017). Complex cellular logic computation using ribocomputing devices. Nature, 548(7665), 117–121. https://doi.org/10.1038/nature23271

Green, A., Silver, P., Collins, J., & Yin, P. (2014). Toehold Switches: De-Novo-Designed Regulators of Gene Expression. Cell, 159(4), 925–939. https://doi.org/10.1016/j.cell.2014.10.002

Fornace, M. E., Porubsky, N. J., & Pierce, N. A. (2020). A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed. ACS Synthetic Biology, 9(10), 2665–2678. https://doi.org/10.1021/acssynbio.9b00523

Ma, D., Shen, L., Wu, K., Diehnelt, C. W., & Green, A. A. (2018). Low-cost detection of norovirus using paper-based cell-free systems and synbody-based viral enrichment. Synthetic Biology, 3(1). https://doi.org/10.1093/synbio/ysy018

Pardee, K., Green, A. A., Takahashi, M. K., Braff, D., Lambert, G., Lee, J. W., Ferrante, T., Ma, D., Donghia, N., Fan, M., Daringer, N. M., Bosch, I., Dudley, D. M., O’Connor, D. H., Gehrke, L., & Collins, J. J. (2016). Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components. Cell, 165(5), 1255–1266. https://doi.org/10.1016/j.cell.2016.04.059