Software | Tec-Monterrey

Overview on synthetic sRNA designing

Standarized protocols for sRNA design

Since the underlying mechanisms for sRNA gene regulation vary depending upon the regulated gene, the sRNA sequence, and the help of chaperone proteins, among other factors, one of the main challenges for sRNA design is trying to find patterns to accurately characterize this behavior.

For the baseline design of the sRNA constructs, our team followed the guidelines stated in the project of iGEM Paris-Bettencourt (2013) [1], who stated that, according to Na et al. (2013) [2], the sRNA must have a perfect base pair matching within the first 24 nucleotides of the target mRNA. Downstream the target-binding sequence, an Hfq-binding domain, taken from the MicC sRNA lacking the binding site to the mRNA OmpC-, is located for recruiting the chaperone protein Hfq.

However this criteria neither considers sRNA structural and thermodynamic profiling nor its hybridization efficiency.

Sequence score matrix

Figure 1. Basic synthetic sRNA design stated by Na an collaborators

Current tools for antisense RNA designing

Short overview

RNAxs: siRNA design considering target accesibility

(Tafer et al., 2008) [3] analyzed the effects of the structure of a target RNA sequence on interference RNA (RNAi) based on the accessibility of the target site. Subsequently, they developed a tool called RNAxs to aid in the selection of highly efficient siRNAs. The program RNAplfold was used to move a window along the mRNA and evaluate the probability that this stretch is unpaired in thermodynamic equilibrium, thus ensuring the accessibility of the site. Despite the fact that the mechanisms underlying siRNA effects are different from those regarding sRNAs (because siRNAs are exclusive of eukaryotic cells), Vazquez-Anderson et al. (2017) [4] also takes into account target accessibility for asRNA hybridization.

MysiRNA Design center

On the other hand, Mysara et al (2015) [5] developed a tool for siRNA design which considers conserved sequence targeting, SNPs and off-target avoiding, and target accessibility, among others.

ERNAi

The tool filters by siRNA specificity and efficiency considering sequence and structural properties of the given siRNAs [6]

Additional considerations upon sRNA gene regulation and HFQ interaction

According to Na et al. (2013) [1], the presence of a consensus scaffold sequence on bacterial sRNAs for recruiting the Hfq protein facilitates the hybridization between the sRNA and the target mRNA as well as mRNA degradation. Focusing on the vicinity of the TIR (Translation Initiation Region), there is a direct correlation between the binding free energy and the gene silencing efficiency, where lower free energies correspond to higher gene repression.

In E. coli most sRNAs that bond to mRNAs depend on the chaperone protein Hfq; most sRNAs characterized block translation by direct-binding to the ribosome in the 5’-UTR of target mRNAs to prevent 30S ribosome binding and translation initiation (Fig. 2C). However, when the sRNA binds to an inhibitory translation sequence in the 5’-UTR, the RBS can become available, allowing translation initiation (Fig. 2D) [7].

In other cases RNA-binding Hfq has shown to be involved in the recruitment of RNase e and the sRNA-mRNA decay; however this mechanism remains unclear. According to Lalalouna et al. (2013) [7], one pathway suggests that mRNAs can become more sensitive to RNase E attacks after base-pairing with sRNAs, as a result of the loss of protection conferred by translating ribosomes (Fig. 2B). Finally, other pathway states that recruitment of RNase E on the target mRNA triggers formation of a sRNA/Hfq/RNase E complex that favors RNase E degradation (Fig. 2A).

It is important to note that when the target sites are located deep in CDS mRNA regions, the sRNA/Hfq/mRNA/RNase E complex is more likely to execute a “degradation-only” mechanism (Wagner et al., 2015) (Figure 3) [8].

Sequence score matrix

Figure 2. Main mechanisms for sRNA/mRNA/Hfq interaction

Sequence score matrix

Figure 3. Degradation-only mechanism mediated by sRNA/mRNA/Hfq/RNase E complex downstream gene start

TIR region sRNA targeting

(Yoo et al., 2013) [9] propose a simple design, considering as the only criterion the union of the sRNA within the TIR region, this region is where the ribosome joins, with extension from the SD sequence up to the next 30 nt. It is worth mentioning that this proposal contemplates the binding energy of sRNA with mRNA, to achieve the purpose of altering the efficiency of translation, it also considers the size of 20 to 30 nts in length, since the greater the length, the greater the possibility of off-target repression, the binding energy of -30 to -40 kcal/mol is also considered.

sRNA design considering Hfq recruiting and mRNA TIR targetting

(Zhu et al., 2021) [10] designed a synthetic sRNA system based on the MicC scaffold and the chaperone Hfq to control gene expression in Methylorubrum extorquens. The criteria that they used for designing the asRNA were length, location and binding free energy. Their paper also cites (Na et al. 2013) [2], which says that an asRNA 24 nucleotides long in the translation initiation region (TIR) of the target mRNA shows high suppression activity (>90%). Their sRNA was designed accordingly. The online service DINAMelt was used to calculate the binding free energy between the asRNA and its target mRNA.

Our solution: rnatrix

A neural network-based python program for designing optimal sRNA sequences and predicting its downregulation or upregulation behavior (previously called in some pages as sRNA designer)

Figure 4. rnatrix logo

As we have ponted out before, actually there are not software tools for sRNA designing in order to target a specific gene. The actual protocols for sRNA design mainly takes into account the targeting of the TIR region as the main criteria for a given sRNA, and then the thermoidymanical properties of the sRNA:mRNA basepairing are calculated using external tools (without compromising the previous sRNA selection)

On the other hand, the actual software tools for asRNA design are focused on siRNAs

In order to overcome these obstacles, our team created rnatrix: a python-based pipeline for creating a dataset of sRNAs for a given mRNA, and then selecting the best sRNA options cosnidering:

1. The optimal structural and sequence features for a given sRNA, thus considering having "true sRNA" characteristics (see model 1)

2. Optimal sRNA:mRNA hybridization efficiency, considering the accesibility of the mRNA target region and the self-folding energies of the sRNA and the target mRNA

3. The higher probabilities of performing an upregulation or downregulation role on the cell. This feature is calculated using a neural-network based model, previously trained by a database collected by our team

STEP 1. SCORING

sRNA Thermodynamic scoring based on models 1 and 2

Transcription to mRNA and sRNA dataset creation

Since all of the sRNAs have to bind to a 24-nt mRNA region (changeable by the user), a sliding-window of 24-nt is placed over the identified conserved sequences of the DNA and then the transcripted mRNA together with its corresponding binding-sRNA are created. After this, a Hfq-binding domain is attached to the 3’ end of all the sRNAs for further analysis.

sRNA:mRNA scoring

Then, the scoring method taken from models I and II is applied to all the created sRNA:mRNA pairs.

Sequence and structural information-based sRNA scoring (score 1)

The score taken from Model I is used for address the degree of being a “true sRNA” for all created sequences, since its original purpose was to filter out sRNA sequences from non-sRNA ones in the PresRAT server for identification of bacterial sRNA sequences developed by Kumar and collaborators (Kumar et al. 2021) [11]

Sequence score and Uracil load scoring (score 1)

These parameters are calculated straightforward using the formulas previously explained on the model section considering only the sRNA sequence.

Local minima profiling (energy landscape) and RNA suboptimal structures (score 1)

As stated before, both ABE and ALE scores require the information of RNA local minima. For calculating the number of local minima of a given RNA sequence together with its respective free energy values, in the present work the packages RNAsubopt and Barriers (Lorenz et al., 2011) [12] were used. Some RNA molecules form meta-stable structures represented as local minima on an energy landscape. The barriers algorithm identifies all the local minima and energy barriers separating them inside an energy landscape of a given RNA sequence (Gruber et al., 2008) [13]. A concise explanation of the use of the two previously mentioned programs is stated by (Chen & Burke, 2015) [14]: while RNAsubopt computes all the possible conformations that a given RNA can adopt within a defined energetic range [kcal/mol] above the Minimum free energy of the sequence, the program barriers takes all the suboptimal structures given by RNAsubopt and then find all the local minima and the according saddle points between them. All the structures given by RNAsubopt can be one of the follows:

A local minimum
A saddle point connecting at least two local minimum points, or
The basin of one local minimum

sRNA Designer

Figure 5. Algorithm for calculating the structure of local minima and its respective self-folding energies

For calculating the structure of the local minima, first RNAsubopt generates a list of the suboptimal structures in dot-bracket notation with the folding energy of each structure; in this step only the sequence and the energy range above the MFE must be specified, then barriers calculates the structures of all local minima (also in dot-bracket notation) with the folding energy of each local minimum taking as an input the information of suboptimal structures given by RNAsubopt. It is important to note that one must specify how much local minima can be computed by the program; because in the present project, the terms ABEscore and ALEscore only work with the average features of the listed local minima, we noticed that there’s not enough effect if the number of maximum computed local minima changes with respect to the default parameter indicated by (Gruber et al., 2008) (max. 50) [13].

On the other hand, because the number of suboptimal structures calculated by RNAsubopt grows exponentially with the sequence length and the energy range above the MFE, and thus the computing time also increases, our team conducted a test series in order to identify in which energetic range the calculations of RNAsubopt are still reliable for using its data on the calculation of ABEscore and ALEscore while maintaining the computing time at a minimum; as result we defined 5 kcal/mol as an adequate energy range. The algorithm for calculating local minima structures and folding energies is noted on figure (3)

Once all of the 4 individual scores (ABE, ALE, Sequence and U-rich) are calculated, a normalizing function is applied to the entire dataset.

Hybridization efficiency scoring (model 2)

In this part, the free energy of base pairing between the sRNA and the mRNA target regions (ΔG asT), the free energy of local folding of target mRNA region (plus one nucleotide in 3’ and 5’ direction when available) (ΔG tF) are calculated using RNAfold from Viennarna suite. Finally, an accessibility factor θ is calculated using the package Nupack. All of these parameters are used for calculating score 2, which estimates the hybridization efficiency of the generated sRNAs; as with score 1, this data is also normalized in order to add both scores for calculating a final one in 0-2 scale. This scoring is taken from model II (Vazquez-Anderson et al., 2017) [4]

sRNA Designer

Figure 7. Scoring algorithm. NOTE: In the diagram the input file appears as having a .fasta format; however, the user MUST input a file in .txt format WITHOUT any .fasta header

Final scoring and data delivery to user

The final score is the result of adding both scores giving them an equal ponderation. Since both of them measure different parameters, we consider that this operation is non redundant. Once the pipeline ends, a .csv archive is given to the user with the dataset of all created sRNA:mRNA pairs, with every parameter calculated and at the end of the columns are the normalized scores 1 and 2, with the final one

Final sRNA:mRNA best pairs selection

Despite the fact that the created sRNAs are for all the analyzed genes, our team decided to only consider those inside a region of 100 nts downstream the start of the gene. This is because there is not enough information for predicting the sRNA behavior on sites far downstream of the start codon. Also, after analyzing the databases for the neural network training, we have found that the majority of the sRNAs who pair on the CDS target sites near the gene start.

It is important to note that the majority registered sRNA parts were selected according to this criteria, since the incorporation of the next steps (neural-network based prediction) was done after we ordered the DNA parts for synthesis

STEP 2. REGULATION BEHAVIOR PREDICTING

Neural-network based prediction of the upregulation or downregulation role of the scored sRNAs

Once the entire dataset with all possible sRNA:mRNA pairs has been created and the scoring criteria has been applied to all pairs (and re-normalized, this time in a 0-1 scale). Our software loads the previously trained neural-network model (see model) in order to calculate the probability of each sRNA to exert an upregulation or downregulation role.

It is important to note that only one probability is calculated depending upon the sRNA: the probability of upregulate or downregulate the desired gene. For the final sRNA selection, the information on table 1 is processed towards a function (see model).

sRNA Designer

Table 1. Final features considered for best sRNA selection. The predicted role is 0 for downregulation and 1 for upregulation. The sRNA string is not loaded into the neural network model as text, and is shown in this table only for illustration purposes

Important notes:

Despite the fact that the neural network uses all of the calculated parameters used while scoring the sRNA:mRNA pairs, none of the final scores are used but its individual components without any normalization procedure. Also, some of the parameters used for predicting the regulation behavior are not calculated during the scoring process (see model)

Finally, the user will receive the best sRNA:mRNA target pairs (according to harmonic mean function), with its hybridization position over the entire mRNA transcript in a .csv file called “results.csv”

sRNA Designer

Figure 8. Final rnatrix algorithm with scoring and sRNA role prediction based on a neural network model. IMPORTANT NOTE: The "lambda function" is really the "harmonic mean function"

It is important to note that only the sRNA parts (without promoter and terminator) BBa_K4506052 to BBa_K4506056 were done incorporating the neural network model to our software (see results)

References

https://2013.igem.org/Team:Paris_Bettencourt
Na D, Yoo SM, Chung H, Park H, Park JH, Lee SY. Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nat Biotechnol. 2013 Feb;31(2):170-4. doi: 10.1038/nbt.2461. Epub 2013 Jan 20. PMID: 23334451
Tafer H, Ameres SL, Obernosterer G, Gebeshuber CA, Schroeder R, Martinez J, Hofacker IL. The impact of target site accessibility on the design of effective siRNAs. Nat Biotechnol. 2008 May;26(5):578-83. doi: 10.1038/nbt1404. Epub 2008 Apr 27. PMID: 18438400
Vazquez-Anderson J, Mihailovic MK, Baldridge KC, Reyes KG, Haning K, Cho SH, Amador P, Powell WB, Contreras LM. Optimization of a novel biophysical model using large scale in vivo antisense hybridization data displays improved prediction capabilities of structurally accessible RNA regions. Nucleic Acids Res. 2017 May 19;45(9):5523-5538. doi: 10.1093/nar/gkx115. PMID: 28334800; PMCID: PMC5435917
Mysara M, Garibaldi JM, Elhefnawi M. MysiRNA-designer: a workflow for efficient siRNA design. PLoS One. 2011;6(10):e25642. doi: 10.1371/journal.pone.0025642. Epub 2011 Oct 26. Erratum in: PLoS One. 2015;10(3):e0119062. PMID: 22046244; PMCID: PMC3202522
Arziman Z, Horn T, Boutros M. E-RNAi: a web application to design optimized RNAi constructs. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W582-8. doi: 10.1093/nar/gki468. PMID: 15980541; PMCID: PMC1160229
Lalaouna, D., Simoneau-Roy, M., Lafontaine, D., & Massé, E. (2013). Regulatory RNAs and target mRNA decay in prokaryotes. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1829(6–7), 742–747. https://doi.org/10.1016/j.bbagrm.2013.02.013
Wagner, E. G. H. & Romby, P. (2015). Small RNAs in Bacteria and Archaea. Advances in Genetics, 133-208. https://doi.org/10.1016/bs.adgen.2015.05.001
Yoo SM, Na D, Lee SY. Design and use of synthetic regulatory small RNAs to control gene expression in Escherichia coli. Nat Protoc. 2013 Sep;8(9):1694-707. doi: 10.1038/nprot.2013.105. Epub 2013 Aug 8. PMID: 23928502
Zhu, L. P., Song, S. Z., & Yang, S. (2021). Gene repression using synthetic small regulatory RNA in Methylorubrum extorquens. Journal of Applied Microbiology, 131(6), 2861–2875. https://doi.org/10.1111/jam.15159
Kumar, K., Chakraborty, A., & Chakrabarti, S. (2021). PresRAT: A server for identification of bacterial small-RNA sequences and their targets with probable binding region. RNA Biology, 18(8), 1152–1159. https://doi.org/10.1080/15476286.2020.1836455
Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011 Nov 24;6:26. doi: 10.1186/1748-7188-6-26. PMID: 22115189; PMCID: PMC3319429
Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. The Vienna RNA websuite. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W70-4. doi: 10.1093/nar/gkn188. Epub 2008 Apr 19. PMID: 18424795; PMCID: PMC2447809
Chen, S. & Burke-Aguero, D. H. (2015, 24 febrero). Computational Methods for Understanding Riboswitches (ISSN Book 553) (English Edition) (1.a ed.). Academic Press