According to Na et al. (2013) [1], the presence of a consensus scaffold sequence on bacterial sRNAs for recruiting the Hfq protein facilitates the hybridization between the sRNA and the target mRNA as well as mRNA degradation. Focusing on the vicinity of the TIR (Translation Initiation Region), there is a direct correlation between the binding free energy and the gene silencing efficiency, where lower free energies correspond to higher gene repression.
In E. coli most sRNAs that bond to mRNAs depend on the chaperone protein Hfq; most sRNAs characterized block translation by direct-binding to the ribosome in the 5’-UTR of target mRNAs to prevent 30S ribosome binding and translation initiation (Fig. 2C). However, when the sRNA binds to an inhibitory translation sequence in the 5’-UTR, the RBS can become available, allowing translation initiation (Fig. 2D) [7].
In other cases RNA-binding Hfq has shown to be involved in the recruitment of RNase e and the sRNA-mRNA decay; however this mechanism remains unclear. According to Lalalouna et al. (2013) [7], one pathway suggests that mRNAs can become more sensitive to RNase E attacks after base-pairing with sRNAs, as a result of the loss of protection conferred by translating ribosomes (Fig. 2B). Finally, other pathway states that recruitment of RNase E on the target mRNA triggers formation of a sRNA/Hfq/RNase E complex that favors RNase E degradation (Fig. 2A).
It is important to note that when the target sites are located deep in CDS mRNA regions, the sRNA/Hfq/mRNA/RNase E complex is more likely to execute a “degradation-only” mechanism (Wagner et al., 2015) (Figure 3) [8].
TIR region sRNA targeting
(Yoo et al., 2013) [9] propose a simple design, considering as the only criterion the union of the sRNA within the TIR region, this region is where the ribosome joins, with extension from the SD sequence up to the next 30 nt. It is worth mentioning that this proposal contemplates the binding energy of sRNA with mRNA, to achieve the purpose of altering the efficiency of translation, it also considers the size of 20 to 30 nts in length, since the greater the length, the greater the possibility of off-target repression, the binding energy of -30 to -40 kcal/mol is also considered.
sRNA design considering Hfq recruiting and mRNA TIR targetting
(Zhu et al., 2021) [10] designed a synthetic sRNA system based on the MicC scaffold and the chaperone Hfq to control gene expression in Methylorubrum extorquens. The criteria that they used for designing the asRNA were length, location and binding free energy. Their paper also cites (Na et al. 2013) [2], which says that an asRNA 24 nucleotides long in the translation initiation region (TIR) of the target mRNA shows high suppression activity (>90%). Their sRNA was designed accordingly. The online service DINAMelt was used to calculate the binding free energy between the asRNA and its target mRNA.
Our solution: rnatrix
A neural network-based python program for designing optimal sRNA sequences and predicting its downregulation or upregulation behavior (previously called in some pages as sRNA designer)
Figure 4. rnatrix logo
As we have ponted out before, actually there are not software tools for sRNA designing in order to target a specific gene. The actual protocols for sRNA design mainly takes into account the targeting of the TIR region as the main criteria for a given sRNA, and then the thermoidymanical properties of the sRNA:mRNA basepairing are calculated using external tools (without compromising the previous sRNA selection)
On the other hand, the actual software tools for asRNA design are focused on siRNAs
In order to overcome these obstacles, our team created rnatrix: a python-based pipeline for creating a dataset of sRNAs for a given mRNA, and then selecting the best sRNA options cosnidering:
1. The optimal structural and sequence features for a given sRNA, thus considering having "true sRNA" characteristics (see model 1)
2. Optimal sRNA:mRNA hybridization efficiency, considering the accesibility of the mRNA target region and the self-folding energies of the sRNA and the target mRNA
3. The higher probabilities of performing an upregulation or downregulation role on the cell. This feature is calculated using a neural-network based model, previously trained by a database collected by our team
STEP 1. SCORING
sRNA Thermodynamic scoring based on models 1 and 2
Figure 5. Algorithm for calculating the structure of local minima and its respective self-folding energies
For calculating the structure of the local minima, first RNAsubopt generates a list of the suboptimal structures in dot-bracket notation with the folding energy of each structure; in this step only the sequence and the energy range above the MFE must be specified, then barriers calculates the structures of all local minima (also in dot-bracket notation) with the folding energy of each local minimum taking as an input the information of suboptimal structures given by RNAsubopt. It is important to note that one must specify how much local minima can be computed by the program; because in the present project, the terms ABEscore and ALEscore only work with the average features of the listed local minima, we noticed that there’s not enough effect if the number of maximum computed local minima changes with respect to the default parameter indicated by (Gruber et al., 2008) (max. 50) [13].
On the other hand, because the number of suboptimal structures calculated by RNAsubopt grows exponentially with the sequence length and the energy range above the MFE, and thus the computing time also increases, our team conducted a test series in order to identify in which energetic range the calculations of RNAsubopt are still reliable for using its data on the calculation of ABEscore and ALEscore while maintaining the computing time at a minimum; as result we defined 5 kcal/mol as an adequate energy range. The algorithm for calculating local minima structures and folding energies is noted on figure (3)
Final scoring and data delivery to user
The final score is the result of adding both scores giving them an equal ponderation. Since both of them measure different parameters, we consider that this operation is non redundant. Once the pipeline ends, a .csv archive is given to the user with the dataset of all created sRNA:mRNA pairs, with every parameter calculated and at the end of the columns are the normalized scores 1 and 2, with the final one
Final sRNA:mRNA best pairs selection
Despite the fact that the created sRNAs are for all the analyzed genes, our team decided to only consider those inside a region of 100 nts downstream the start of the gene. This is because there is not enough information for predicting the sRNA behavior on sites far downstream of the start codon. Also, after analyzing the databases for the neural network training, we have found that the majority of the sRNAs who pair on the CDS target sites near the gene start.
It is important to note that the majority registered sRNA parts were selected according to this criteria, since the incorporation of the next steps (neural-network based prediction) was done after we ordered the DNA parts for synthesis
STEP 2. REGULATION BEHAVIOR PREDICTING
Neural-network based prediction of the upregulation or downregulation role of the scored sRNAs