CropFold relies on toehold switch-based sensing for the detection of plant pathogens. We designed novel toehold switches to control the expression of a reporter protein in our cell-free systems. The purpose of the toehold switch is to change its secondary structure in the presence of a trigger sequence, which in our project is a conserved sequence in the target pathogen’s genome. Because our toehold design algorithms produced multiple possible toehold switches, we needed a model to narrow the selection down to the most optimal designs.
Toehold switches rely on their secondary structure for their function. In their inactive form, the ribosome binding site (RBS) and start codon are sequestered in a stable stem-loop. When the trigger binds to the toehold switch, the lower part of the stem-loop unwinds, leaving only a weak stem intact, which can be opened when a ribosome binds to the RBS, starting the translation (Green et al, 2017). The toehold switch must be designed to fold into this predetermined structure. In our modeling, we estimate their deviations from the optimal structure (Fig. 1) to gauge their performance in silico.
We developed this model for the analysis of A- and B-series toehold switches from Pardee et al (2016), which are designed to reduce the translational leakage compared to older toehold switch structures. Using this model, one can quickly identify the most promising toehold designs and test their performance in the lab.
We decided to use the most optimal three-parameter fit from Ma et al (2018) to rank our designs. The fit has a fairly strong correlation to their performance (R2 = 0,57). This fit assigns a score for each toehold switch based on three parameters calculated based on their deviations from their ideal structure. This fit was originally designed to select B-series toeholds, but we expanded it to A-series toeholds by assigning the parameters based on their definitions.
Based on the results provided by our model, we selected the three highest ranking A- and B-series toeholds designed to detect sequences from the barley yellow dwarf virus genome in our lab tests. When we decided to use these toehold switches, our design and modeling algorithms were not yet optimized, so the designs we picked were not optimal. We also modified our algorithms to use the 21-nt linker sequence instead of the 30-nt linker sequence present in our original toehold switches. We learned a lot during our partnership with TrigGate and adjusted our algorithms accordingly. We later assigned updated scores for our original toehold designs and alternative designs. These scores ignore the last 9 nucleotides of the linker sequence, as it’s the updated version of the model. The scores are displayed in Table 1.
|Toehold identifier||Score||Toehold identifier||Score|
|ABOA A70||16.40528||ABOA B39||13.18878|
|ABOA A92||15.39586||ABOA B46||14.08091|
|ABOA A95||15.39047||ABOA B69||11.98966|
|TrigGate A70||6.19759||TrigGate B39||24.0725|
|TrigGate A92||-1.00635||TrigGate B46||28.13376|
|TrigGate A95||0.201508||TrigGate B69||18.89491|
We sought to test the model’s prediction capabilities in our lab by comparing the performances of these different toehold switches and their scores. We saw a clear correlation between the ratio of endpoint signal production in the presence and absence of the trigger (ON/OFF ratio) and the score assigned by our model. However, we did not get enough experimental data to draw credible conclusions. Therefore, we can only conclude the model’s viability for predicting the performance of B-series toeholds as was demonstrated before (Ma et al, 2018). We still suggest that this model is viable for A-series toeholds as well. The correlation between the ON/OFF ratio and the score is shown in Figure 2.
This model was also used to assign scores for each of the toehold switches we created for our toehold switch library. After we updated our design algorithm, we designated scores for our newly produced toehold switches for detection of the barley yellow dwarf virus. The scores for the best A-series toehold switches were 21.8 and 21.8. The best B-series toeholds scored 23.1 and 22.3. These results suggest that our improved design algorithm was improved from the first iteration. Scores for all toehold switches in our library are presented below and they are also visible on their pages in the registry (parts BBa_K4207002-BBa_K4207060).
|Toehold identifier||Score||Toehold identifier||Score|
|WDV A1||27.65||WDV B1||26.62|
|WDV A2||27.54||WDV B2||25.56|
|CGMMV A1||26.94||CGMMV B1||22.45|
|CGMMV A2||26.36||CGMMV B2||21.43|
|TBRFV A1||33.03||TBRFV B1||27.52|
|TBRFV A2||32.54||TBRFV B2||27.12|
|PMMV A1||31.18||PMMV B1||26.59|
|PMMV A2||30.83||PMMV B2||25.50|
|PPV A1||29.40||PPV B1||29.46|
|PPV A2||29.02||PPV B2||28.62|
|SDV A1||29.00||SDV B1||32.43|
|SDV A2||29.91||SDV B2||30.23|
|SPLCV A1||31.02||SPLCV B1||23.02|
|SPLCV A2||30.41||SPLCV B2||22.67|
|TCV A1||21.21||TCV B1||21.72|
|PMV A1||21.09||TCV B2||20.87|
|PMV A2||20.81||PMV B1||24.38|
|TMV A1||30.79||PMV B2||24.36|
|TMV A2||29.47||TMV B1||27.90|
|PVY A1||33.15||TMV B2||26.00|
|PVY A2||32.72||PVY B1||31.44|
|BYDV A1||21.77||PVY B2||28.64|
|BYDV A2||21.41||BYDV B1||23.10|
We evaluated our toehold designs based on a three parameter-fit from Ma et al (2018). Here are the parameters used and their explanations:
|dsensor||Refers to the number of incorrectly oriented nucleotides in the whole inactive structure|
|dactive sensor||Refers to the number of incorrectly oriented nucleotides in the active structure, starting from the first nucleotide downstream of the binding site|
|dbinding site||Refers to the number of incorrectly oriented nucleotides in the first 25 nucleotides of the binding site; this is the part of the binding site not forming the base of the stem-loop|
This three-parameter fit was originally developed for B-series toeholds, but we adopted it to function for the A-series toeholds as well. The regions of the toehold switches corresponding to each value are highlighted in Figure 3.
All of these parameters are calculated using the NUPACK 22.214.171.124 program. They are calculated for each region using the defect function, which evaluates the normalized complex ensemble defect for the sequence of interest. It uses a pairwise probability matrix to estimate the average amount of incorrectly paired nucleotides in the design, relative to the defined structure (Dirks et al, 2007; Fornache et al, 2020).
Each toehold was assigned a score based on the calculated parameters based on the following formula by Ma et al (2018):
The deviations from the optimal structure hinder the toehold switches performance, which is why they are assigned negative signs. Using this model, the toeholds which fold into a secondary structure most closely resembling the optimal structure are given the highest score and should be considered the best ones.
Although this model is useful in narrowing down the list of potential toehold switches to test out in the lab, it can’t predict the best designs with absolute certainty. The three-parameter fit used in our model has an R2 value of 0.57, so it can only predict 57 % of the variation in the performance.
Because we are analyzing only the sensor sequence, the model assumes no interactions of molecules present in the reaction. We used the ‘defect’ function of the NUPACK python module, which estimates the normalized complex defect and takes advantage of the complex analysis functionalities of the module as opposed to the tube analysis functionalities (Dirks et al, 2007; Fornache et al, 2020). We also ignore the impact of the reporter protein coding sequence on the folding of the toehold switch.
This means that the model doesn’t take into account all the concentration and crosstalk effects between different strand species present in the reaction. Therefore, the model may not estimate the viabilities of the toehold switches correctly if the reaction contains other molecules that significantly impact the folding of the ssRNA sequence.
The score assigned for each toehold is based on its deviation from the ideal structure, which can vary based on the desired properties of the toehold switch. For instance, A-series and B-series toehold switches have different properties, as B-series switches exhibit lower translational leakage but A-series switches produce more signal when activated. This stems from the fact that B-series switches don’t contain the stabilizing refolding domain in the active structure found in the A-series switches and so the equilibrium favors the inactive state more in B-series toehold switches compared to A-series. Because different ideal structures cause different properties, the scores between toeholds of different design principles can’t be reliably compared.
Despite all of these factors, this model is a useful tool when used to narrow down potential toehold switches, but a few top candidates should still be tested in the lab to determine the most optimal toehold design.
This model can be used in NUPACK to evaluate the performance of toeholds designed with any algorithm. We have integrated it into our design algorithm so we can readily see the best of our designs.
To access the model, one first has to install NUPACK 126.96.36.199 on their computer (see our NUPACK guide). The user also has to install the desired version of the model from our GitLab page. We created ready modeling programs for the A- and B-series toeholds from Pardee et al (2016), but this code is easily modifiable for different target structures. Then the user should have a CSV file containing their designed sequences in the following format: identifier, sensor sequence, trigger sequence. The identifier and trigger sequence are not necessary for the function of the program, but they are useful for downstream data handling as all the necessary information is in the same place. There is no limit on the size of the input file, apart from the power of the user’s computer. Then the user should change the CSV file name (currently sequences.csv) to match their file name. The user should also change the model of the program to match the conditions their toehold switches would operate in.
Now the user is ready to run the program by executing all cells. The program outputs an excel file where the sequences are ranked by their score in descending order.