After testing out all 19 sequences in two batches of 8 and 11 each, we had all the data awaiting analysis. As a first step, we ranked all the sequences in decreasing order of their average fluorescence per optical density (F/OD) values at 24 hours mark (late-stationary phase). We tried to see if this ranking had any correlation with the expression values suggested by Salis Lab’s RBS Calculator (v.2.1.1). To our surprise, the Spearman's rank correlation coefficient came out to be 0.77 for the first batch of 8 UTRs, 0.39 for the second batch of 10 UTRs and barely 0.15 for all sequences put together. This could have happened because the RBS calculator has been designed for readings taken in the log phase. This concern was experienced by Xiao et al. in 2020, and the researchers gave a similar explanation as reasoning.
Similarly, we also tried to see if there was any correlation with the expression values found in the two strains: DH5alpha which is a cloning strain, and BL21(DE3) which is an expression strain. The Spearman’s rank correlation coefficient came out to be 0.87. Such a high value indicates that the overall trend of expression remains same across these two strains.
Next, we tried to see if there was any correlation between the %GC content of the sequence and the F/OD values in the DH5alpha strain. However, no particular trend was observed.
What surprised us the most was that the cspF UTR, only nine nucleotides long, had significantly increased the expression with respect to our control in the DH5alpha strain. It showed above average performance in the BL21(DE3) strain. This sparked more investigation into the last nine nucleotides of all our sequences. We found that syn2E, syn2H and syn2D UTRs, which gave very similar F/OD values as those of cspF in the DH5alpha strain, had the very same last nine nucleotides!
Furthermore, U9 and U10, which had similar F/OD values in both strains (albeit towards the lower side when compared to the control), also had the same last nine nucleotides. Whereas U13, which varied at just one of those nine positions, gave a drastically lower F/OD value (lesser than 50% in both strains compared to U9 and U10) in both strains. The overall length of the UTR region totalled 38 nucleotides (including the constant RBS and two scars which is 29 nucleotides long) which were very close to the cut-off value of 35 suggested by Salis et al. (2009). We did a simple simulation of our own using the first batch of 6 UTRs. We tried varying the pre- and post-start codon cut-off values in the OSTIR calculator from their default value of 35 and found out which tuple gave the least Sum of Squared Errors (SSE) when compared with the experimental data.
Again, we found out that the optimal value of the pre-start codon cut-off is 37 (our RBS and the two scars are 29 nucleotides long. This leaves 37-29=8 nucleotide sequences of the UTR that we had added upstream). Besides this, there is also a paper by Shlomi Dvir et al. in 2013, which uses only up to ten nucleotide sequences of the 5’UTR for expression in yeast.
We were convinced that having just a variable region of about nine nucleotides was enough to impact the expression levels significantly. This formed the basis for our next Design phase: short UTRs seemed promising.