Many products of synthetic biology have become absolutely essential to our daily lives. Most of these products are either proteins, or enzymes that in turn help us make our product. Protein production is an expensive process, which is reflected in the cost of syn-bio products. Hence the need for better control or regulation of protein production arises.
Any typical gene has the promoter, the ribosome binding site, the coding sequence, UTRs and the transcription terminating sequence. UTRs are untranslated regions of the mRNA that flank the coding sequence, and do not get translated to form a polypeptide chain.
The 5’ UTR or the leader sequence is upstream from the coding sequence, and the 3’ UTR or the trailer sequence is downstream from the coding sequence. Most naturally occurring 5’ UTRs are around 100-200 nucleotides long, and are retained after mRNA splicing in eukaryotes.
The secondary structure of the mRNA can form stem-loops, pseudoknots, and other motifs and thus become a factor that influences translation, transcription, frequency of translation initiation, choice of the ORF transcribed, elongation speed, mRNA half-life and folding of nascent proteins. 5’ UTRs are regulators of translation as they not only affect the rate of translation initiation [1] but also affect the half-life of the mRNA. [2]
The rate-limiting step of translation in bacteria in most cases is translation initiation. The initiation complex can form only when the ribosomal binding site is available for the ribosomal subunit to bind to. A 5’ UTR may contain a sequence complementary to the RBS (the sequence being the “antiRBS”), and if the distance between the two allows it, they can bind to each other. At higher temperatures, this secondary structure can open up, resulting in the activation of translation initiation as the RBS is now unmasked for binding to the 30S ribosomal subunit. Extending this concept, we can also create conformational switches between alternative mRNA secondary structures that can be induced by trans-acting factors such as sRNAs and proteins or even by ribosomes that are translating a leader peptide, masking and unmasking the RBS and thus inactivating translation initiation or activating translation initiation respectively.[1]
Another possible limiting constituent of the overall gene expression flow is the decay of mRNA or in other words, the half-life of the mRNA. Viegas et al. concluded this by carrying out a series of experiments where the transcription factors were fixed, and different 5’ UTRs were combined with the same ribosome binding site. mRNAs are degraded by endonucleases and 3’ → 5’ exonucleases. It is found that a translating ribosome protects the mRNA from ribonuclease by providing a steric barrier. Ribosomes that bind to cognate Shine-Dalgarno sequences also protect the mRNA. It has also been shown that mutations causing a reduction in translation initiation efficiency can accelerate mRNA decay. In their experiments, Viegas et al. inserted a suite of stabilizing sequences in the 5’ UTR of an mRNA encoding the GFP reporter to increase protein production.[2] Thus, we realized that it is possible to regulate the gene expression at not only the level of translational initiation frequency but also at the level of mRNA stability using 5’ UTRs.
There have been two notable papers published characterizing the effect of the 5’ UTR on the expression levels. In Dvir et al., the researchers studied the impact of 5’ UTRs on expression levels of a fluorescent reporter protein in yeast. The 5’ UTRs they tested differed only in 10 base pairs preceding the translational start site of the reporter gene.[4] In Balzee et al. in 2020, the researchers used dual 5’ UTRs (one with a high translational rate per transcript and the other with higher transcription but a lower translational rate per copy of a transcript) to get an optimized 5’ UTR which achieves both goals of their component UTR sequences. The researchers demonstrated that the dual 5’ UTR had a synergistic effect compared to a single 5’ UTR.[5]
No experiment is complete without the proper analysis of the data gathered; to find direction for doing so, a paper published by Salis et al. in 2009 proved extremely useful. It describes the RBS calculator, a software that minimizes free energy to compute the translation initiation rate (TIR) at a particular start codon. The expression for ΔGtot as described by the paper is:
Since its inception in 2009, the RBS Calculator has undergone several improvements and the latest version (v.2.1.1) is hosted on a web server with several other useful tools such as the RBS Library Calculator. It now also considers the ΔGstacking term which quantifies G-quadruplex structures or pseudoknots. A detailed description of all the terms of the current model can be found here.
One of the main contributors in the ΔGtot is the free energy of the RNA secondary structure. To predict these values, the RBS calculator uses the NUPACK suite of algorithms with the Mfold RNA energy parameters. We have also used NUPACK and RNAFold to visualize the secondary structures of our RNA sequences and visually correlate them with experimentally observed expression values. Along with the least free energy structure, the web-server also predicts the centroid structure and gives the base pairing probabilities for each nucleotide.
We have extensively used the RBS Calculator and its open-source spin-off by Barrick Labs: OSTIR.
In the iGEM Part Registry, we have the BioBrick constructs of promoters, ribosome binding sites, protein-coding sequences, terminators, and plasmid backbones but not the untranslated regions. Online tools are available to predict translation initiation rates. As an advance to the already available data, we are making a modest attempt to characterize and optimize the 5’UTR sequences.
Our chassis is E. coli of two strains, DH5α, a cloning strain and BL21(DE3), an expression strain. Our markers are fluorescent and chromophoric proteins. The 5’ UTR is inserted upstream of the SD and coding sequences. The expression of our markers is measured by measuring the fluorescence values per OD at 6hrs, 12hrs and 24hrs.
We then correlate these values with OSTIR predictions, structural predictions of NUPACK, and a neural network developed by us to construct a set of UTR sequences which would be tested in the wet lab. Finally, these sequences could be helpful in further understanding gene expression and potentially fine-tuning biotechnology processes.
We have made each 5’ UTR that we have worked on available as a BioBricked sequence for future iGEMmers so that they can optimize the translation initiation further and hence potentially the gene expression.
Midway through 2021, Darsh Vithlani, a member of our team, was engaged in the purification of Anti Snake Venom Serum. Towards the end of 2021, he also visited an industry that did downstream processing of vaccines. Darsh observed that the purifying processes took a long time, and the concentration and quality of the desired product often did not meet expectations. In 2022, Darsh was working in a lab which studied the interactions of proteins with DNA as a substrate. They had to express the protein, purify it, and then proceed to the crucial biochemical assays in order to answer the questions in the lab. For the experiments, the majority of the time was spent on protein expression and purification. Resources, time, and efforts were used. The Silver, Coomassie, and Bradford stains often produced unsatisfactory results. Darsh then contemplated over with Dr. Shamlan Reshamwala. Dr. Shamlan Reshamwala suggested that as a contribution to the Registry of Standard Biological Parts we could investigate the effect of the 5'UTR in further fine tuning gene expression and for the first time make it available as a Biobrick to the community. We looked at the success of Anderson Promoter Library and Rackham RBS Library and decided to put in efforts on our 5'UTR Library for this year's iGEM Project.