LIBRARIES

In this page we summarize the mutants and the oligonucleotides libraries we used to increase the binding affinity between BlcR and the blc operator.

Site-Directed Mutagenesis Library

In module 2 we aim to increase the binding affinity between the transcription factor BlcR and its specific DNA binding sequence, the blc operator. To do this, we modified the DNA binding site on the protein using a Site-Directed-Mutagenesis strategy (SDM). To construct the SDM library, we made a rational design of amino acid substitutions in BlcR.

Making rational amino acid substitutions in BlcR

We tried to modify BlcR by making single amino acid substitutions in the DNA binding domain. First, we had to identify the DNA binding domain. After reading the literature and studying the structure in PyMOL, we found a 3-stranded winged helix-turn-helix (HTH) domain at the N-terminus (Figure 1) [1][2][3].

BlcR winged Helix-turn-Helix crystal structure on PyMOL
Figure 1. BlcR winged Helix-turn-Helix crystal structure on PyMOL.

Transcription factors from the same IclR family use N-terminal HTH domains to bind to DNA [4] (Figure 2). This information established that the HTH was the DNA binding domain. The sequence of this domain is:

Amino acid sequence of the helix turn helix (HTH) domain of BlcR. Amino acids from position 28 to 72.
Figure 2. Amino acid sequence of the helix turn helix (HTH) domain of BlcR. Amino acids from position 28 to 72.

Single amino acid substitutions

we searched for our protein in the PFAM database, a database of protein families that uses hidden Markov models to create numerous sequence alignments and annotate each family's members, we discovered more than 32,000 sequences for the IlcR family's HTH domain. We had some solid foundational information to work with thanks to the publication "High-Throughput Affinity Measurements of Transcription Factor and DNA Mutations Reveal Affinity and Specificity Determinants" [5]. Single amino acids were substituted based on a Hidden Markov Map (HMM). Each amino acid was changed in Pho4's helix-loop-helix domain into the matching valine, alanine, or amino acid found in the protein's orthologs. A substantial change in binding affinity was found in 56% of the 205 genes that were highly expressed, and at least 10 of these differences improved binding affinity.

We used the HMM of the IlcR family's HTH domains found on PFAM to design rational mutations (Figure 3). We wrote a guide on how to use PFAM and PyMOL to make these rational mutations [TU Delft, see contribution section].

PFAM image of the Hidden Markov Map of the helix-turn-helix sequence.
Figure 3. PFAM image of the Hidden Markov Map of the helix-turn-helix sequence.

For each amino acid (AA) in the HTH domain, the potential mutations were first listed. For all the AA, the mutations consisted of alanine and valine and around three of the most prevalent AA among orthologs found in the HMM. We had to reduce them because the number of mutations was greater than our screening capacity. Mutations were chosen using the data from the paper described above. Firstly, we decided to skip the loop domain and the residues after the 73rd as they are the least involved in DNA binding. In light of the fact that positively charged amino acids that are close to the DNA binding site improve affinity [5], we rendered all negatively charged (D/E) amino acids positive (K/R/H), as well as all of the amino acids in the recognition helix. We considered all alterations in and around the recognition helix, which serves as the primary DNA recognition domain. Lastly, since in the paper the substitutions outside the DNA contacting domain that improved binding affinity were all A/V mutations, we only included those for our non-DNA contacting domain. The library of all the mutations can be found in Table 1 underneath.


Table 1. Library Site-Directed Mutagenesis (SDM)
Residue Structure Mutation Codon Substitution Forward primer Reverse primer Label Part registry
36 Helix 0 L36A CTG GCG ATCGCGGACTTGGTTGCGGGC ACGGACCGCACGGCG AA x
36 L36V CTG GTG ATCGTGGACTTGGTTGCGGGC AB x
37 D37A GAC GCG ATCCTGGCGTTGGTTGCGGGC TCTCC AC x
37 D37K GAC AAA ATCCTGAAATTGGTTGCGGGC TCTCC AD x
37 D37R GAC CGC ATCCTGCGCTTGGTTGCGGGC TCTCC AE BBa_K4361300
37 D37V GAC GTG ATCCTGGTGTTGGTTGCGGGC TCTCC AF BBa_K4361301
38 L38A TTG GCG ATCCTGGACGCGGTTGCGGGC TCTCC AG x
38 L38V TTG GTG ATCCTGGACGTGGTTGCGGGC TCTCC AH BBa_K4361319
39 V39A GTT GCG ATCCTGGACTTGGCGGCGGGC TCTCC AI x
40 A40V GCG GTG TTGTGGGCTCTCCGCGTGA CCAAGTCCAGGATACGGACCG AJ BBa_K4361302
41 Loop 1 G41A GGC GCG TTGCGGCGTCTCCGCGTGACC TTACG AK x
41 G41V GGC GTG TTGCGGTGTCTCCGCGTGACC TTACG AL x
47 T47A ACG GCG CTTGCGGCGGCAGAGTTGACG GTCACGCGGAGAGCCC AM x
47 T47G ACG GGC CTTGGCGCGGCAGAGTTGACG AN x
47 T47K ACG AAA CTTAAAGCGGCAGAGTTGACG AO x
47 T47R ACG CGC CTTCGCGCGGCAGAGTTGACG AP x
47 T47S ACG AGC CTTAGCGCGGCAGAGTTGACG AQ x
47 T47V ACG GTG CTTGTGGCGGCAGAGTTGACG AR x
48 Helix 1 A48V GCG GTG CTTACGGTGGCAGAGTTGACGC AS x
49 A49V GCA GTG CTTACGGCGGTGGAGTTGACG CGTTTCC - -
50 E50A GAG GCG AGCGTTGACGCGTTTCCTGGA CTTAC GCCGCCGTAAGGTCAC - -
50 E50D GAG GAT AGATTTGACGCGTTTCCTGG ACTTAC - -
50 E50V GAG GTG AGTGTTGACGCGTTTCCTGGA CTTAC - -
51 L51A TTG GCG AGAGGCGACGCGTTTCCTGGA CTTACC - -
51 L51V TTG GTG AGAGGTGACGCGTTTCCTGGA CTTACC - -
52 T52A ACG GCG AGAGTTGGCGCGTTTCCTGGA CTTAC - -
52 T52V ACG GTG AGAGTTGGTGCGTTTCCTGGA CTTAC - -
53 R53A ACG GCG AGAGTTGACGGCGTTCCTGG ACTTACCGAAAAGCAGT - -
53 R53V ACG GTG AGAGTTGACGGTGTTCCTGG ACTTACCGAAAAGCAGT - -
54 F54A TTC GCG AGAGTTGACGCGTGCGCTGGA CTTACCGAAAAGCA - -
54 F54V TTC GTG AGAGTTGACGCGTGTGCTGGA CTTACCGAAAAGCA - -
55 Loop 2 L55A CTG GCG AGAGTTGACGCGTTTCGCGGA CTTACCGAAAAG - -
55 L55V CTG GTG AGAGTTGACGCGTTTCGTGGA CTTACCGAAAAG - -
61 S61A AGT GCG GCGCGGCGCACGGCCTG TTTTCGGTAAGTCCAGGA AACGC L x
61 S61K AGT AAA GCAAAGCGCACGGCCTG M x
61 S61T AGT ACC GCACCGCGCACGGCCTG N x
61 S61V AGT GTG GCGTGGCGCACGGCCTG O BBa_K4361303
62 Helix 2 A62V GCG GTG GCAGTGTGCACGGCCTG P BBa_K4361304
62 A62I GCG ATT GCAGTATTCACGGCCTG Q BBa_K4361305
62 A62K GCG AAA GCAGTAAACACGGCCTG R BBa_K4361306
62 A62T GCG ACC GCAGTACCCACGGCCTG S BBa_K4361307
63 H63A CAC GCG GCAGTGCGGCGGGCCTGCTC GCGg T -
63 H63V CAC GTG GCAGTGCGGTGGGCCTGCTC GCG U BBa_K4361308
63 H63Y CAC TAT GCAGTGCGTATGGCCTGCTC GCG V BBa_K4361309
63 H63R CAC CGC GCAGTGCGCGCGGCCTGCTC GCG GCACTGCTTTTCGGTAAGT CCAG - -
64 G64R GGC CGC GCACCGCCTGCTCGCGGTGA - -
64 G64V GGC GTG GCACGTGCTGCTCGCGGTGA - -
64 G64A GGC GCG GCACGCGCTGCTCGCGGTGA - -
64 G64E GGC GAA GCACGAACTGCTCGCGGTGA - -
65 L65A CTG GCG GCACGGCGCGCTCGCGGTGA TG XX x
65 L65Y CTG TAT GCACGGCTATCTCGCGGTGA TG - -
65 L65H CTG CAT GCACGGCCATCTCGCGGTGA TG - -
65 L65V CTG GTG GCACGGCGTGCTCGCGGTGA TG CGTGCGCACTGCTTTTCG A x
65 L65M CTG ATG GCACGGCATGCTCGCGGTGA TG B x
66 L66V CTC GTG GCACGGCGTGCTCGCGGTGA TG AP BBa_K4361310
66 L66A CTC GCG GCACGGCGCGCTCGCGGTGA TG C BBa_K4361311
66 L66C CTC TCG GCACGGCTCGCTCGCGGTGA TG D x
66 L66I CTC ATT GCACGGCATTCTCGCGGTGA TG E BBa_K4361312
66 L66K CTC AAA GCACGGCAAACTCGCGGTGA TG F x
67 A67Q GCG CAG GCCTGCTCCAGGTGATGACCG AACTGGAC AS BBa_K4361313
67 A67V GCG GTG GCCTGCTCGTGGTGATGACCG AACTGGAC G BBa_K4361314
67 A67H GCG CAT GCCTGCTCCATGTGATGACCG AACTGGAC H BBa_K4361315
68 V68T GTG ACC GCCTGCTCACCGTGATGACCG AACTGGAC AR BBa_K4361316
68 V68A GTG GCG GCCTGCTCGCGGTGATGACCG AACTGGAC I x
68 V68K GTG AAA GCCTGCTCAAAGTGATGACCG AACTGGAC J BBa_K4361317
68 V68S GTG AGC GCCTGCTCAGCGTGATGACCG AACTGGAC K BBa_K4361318
69 M69L ATG CTG TGCTGACCGAACTGGACCTGT TGG CCGCGAGCAGGCC - -
69 M69A ATG GCG TGGCGACCGAACTGGACCTGT TGG - -
69 M69K ATG AAA TGAAAACCGAACTGGACCTG TTGG - -
69 M69V ATG GTG TGGTGACCGAACTGGACCTG TTGG - -
69 M69F ATG TTT TGTTTACCGAACTGGACCTG TTGG - -
70 T70V ACC GTG TGATGGTGGAACTGGACCTGT TGGCGC - -
70 T70A ACC GCG TGATGGCGGAACTGGACCTGT TGGCGC - -
70 T70E ACC GAA TGATGGAAGAACTGGACCTG TTGGCGC - -
70 T70I ACC ATT TGATGATTGAACTGGACCTGT TGGCGC - -
70 T70K ACC AAA TGATGAAAGAACTGGACCTGT TGGCGC - -
71 E71H GAA CAT TGATGACCCATCTGGACCTGT TGGCGCG - -
71 E71K GAA AAA TGATGACCAAACTGGACCTGT TGGCGCG - -
71 E71A GAA GCG TGATGACCGCGCTGGACCTGT TGGCGCG - -
71 E71V GAA GTG TGATGACCGTGCTGGACCTGT TGGCGCG - -
72 Loop 3 L72A CTG GCG TGATGACCGAAGCGGACCTGT TGGCGC - -
72 L72E CTG GAA TGATGACCGAAGAAGACCTGT TGGCGC - -
72 L72H CTG CAT TGATGACCGAACATGACCTGT TGGCGC - -
72 L72V CTG GTG TGATGACCGAAGTGGACCTG TTGGCGC - -
73 D73A GAC GCG CGAACTGGCGCTGTTGGCG CGTTCCG GTCATCACCGCGAGCAGG - -
73 D73G GAC GGC CGAACTGGGCCTGTTGGCGCG TTCCG - -
73 D73V GAC GTG CGAACTGGTGCTGTTGGCGCG TTCCG - -


Oligonucleotides Library

In module 3 the goal is to increase the binding affinity between the DNA strand and the transcription factor, by engineering the Blc operator sequence. A set of Blc operator variants are designed in Benchling based on the wildtype sequence originally described by Pan et al. [3].

Sequence modifications

The original sequence (designated BBa_K4361001) contains two sets of different ‘inverted repeat pairs’, short DNA sequences whose ends are complementary to their own inverse sequence, separated by a 3 nucleotide spacer. According to literature, each inverted repeat pair is able to bind a BlcR dimer, with the spacing between pairs allowing for tetramerization of the protein (Figure 4) [3].

51 bp Blc operator sequence with inverted repeats displayed in pink and purple, and dark and light blue
Figure 4. 51 bp Blc operator sequence with inverted repeats displayed in pink and purple, and dark and light blue.

Firstly, the wildtype sequence was scrambled using the Genscript Sequence Scramble tool [6] to create an oligo containing the same nucleotides in a randomized order. Since this oligo does not contain either of the inverted repeat pairs, binding of BlcR to the oligo would be unlikely, thus can be used as a negative control to compare other sequences. This oligo was designated BBa_K4361000.

For BBa_K4361002 through BBa_K4361013, the oligos remained largely the same as the wildtype sequence, with only the sequences at the positions of the inverted repeat pairs being changed. The original inverted repeat pairs were combined in different amounts, orders, and orientations, and a number of new pairs containing small variations were created to further diversify the oligo set. These oligos all have the same length as the wildtype sequence, 51 nucleotides.

The next group of oligos covers BBa_K4361014 to BBa_K4361019. These sequences are designed to bind a different number of BlcR proteins by varying the spacer between inverted repeat pairs and as well as increasing the number of inverted repeats. The ability of BlcR to effectively bind these DNA molecules should change significantly when compared to the oligos of the original length. A number of oligos have been designed to induce BlcR multimerization beyond tetramerization, as further described in their respective part pages .

BBa_K4361020 was designed for a BlcR-Cro repressor hybrid protein. This protein has the same GHB-binding and tetramerization domain as wild-type BlcR, but now contains the DNA-binding domain of the Cro repressor protein. Cro repressor binds two specific DNA sequences itself, which replace the inverted repeat pairs of the original oligo.

Finally, through our partnership with DTU, the team has looked into possible variants of the BlcR operator. By finding sequences originating from different strains of Agrobacterium tumefaciens, they were able to derive both a consensus sequence as well as a variant operator found in a different strain of the bacterium than our wildtype oligo. The significant differences between these sequences and our own are crucial to analyze BlcR’s DNA-binding properties in further detail.


Table 2. Engineered Blc operator sequences
Number Sequence Description Part registry
0 ATACTGCAATGTACTTAATAGATCA CCGCTGCTTGCACTAGTGGTTATAAT Scrambled sequence (51 bp) BBa_K4361000
1 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGttgaactCTAATGCGGG Original (51 bp) BBa_K4361001
2 CCATAGTTCACTCTAATgATTCAAGT TCAACTCTAATgATTCAAGTGCGGG IR1 repeated (51 bp) BBa_K4361002
3 CCATAGTTCATTAGttgaactCTAAT TCAATTAGttgaactCTAATGCGGG IR2 repeated (51 bp) BBa_K4361003
4 CCATAGTTCACTCTAATgATTAGAGT TCAATTAGttgaactCTAATGCGGG IR1 perfect RV 1 + IR2 (51 bp) BBa_K4361004
5 CCATAGTTCACTTGAATgATTCAAGT TCAATTAGttgaactCTAATGCGGG IR1 perfect RV 2 + IR2 (51 bp) BBa_K4361005
6 CCATAGTTCATTAGttgaactCTAAT TCAACTCTAATgATTCAAGTGCGGG IR2 + IR1 (51 bp) BBa_K4361006
7 CCATAGTTCACTCTttgaactCAAGT TCAATTAGttgaactCTAATGCGGG R1 outer 5 + IR2 (51 bp) BBa_K4361007
8 CCATAGTTCACTCTttgaactAGAGT TCAATTAGttgaactCTAATGCGGG IR1 perfect RV 1 outer 5 + IR2
(51 bp)
BBa_K4361008
9 CCATAGTTCACTTGttgaactCAAGT TCAATTAGttgaactCTAATGCGGG IR1 perfect RV 2 outer 5 + IR2 (51 bp) BBa_K4361009
10 CCATAGTTCACTTGAATcATTAGAGT TCAATTAGttgaactCTAATGCGGG IR1 flip + IR2 (51 bp) BBa_K4361010
11 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGagttcaaCTAATGCGGG IR1 + IR2 flip (51 bp) BBa_K4361011
12 CCATAGTTCACTTGAATcATTAGAGT TCAATTAGagttcaaCTAATGCGGG IR1 flip + IR2 flip (51 bp) BBa_K4361012
13 CCATAGTTCATTAGagttcaaCTAAT TCAACTTGAATcATTAGAGTGCGGG IR2 flip + IR1 flip (51 bp) BBa_K4361013
14 CCATAGTTCACTCTAATgATTCAAGT TCAGCGGGATTAGttgaactCTAAT GCGGG IR1 + 5 bp linker + IR2 (56 bp) BBa_K4361014
15 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGttgaactCTAATTCA ACTCTAA TgATTCAAGTGCGGG IR1 + IR2 + IR1 (71 bp) BBa_K4361015
16 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGttgaactCTAAT TCA ACTCTAATgATTCAAGT TCA ATTAGttgaactCTAATGCGGG IR1 + IR2 + IR1 + IR2 (91 bp) BBa_K4361016
17 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGttgaactCTAATTCAGC GGGTCAGCGGGACTCTAATgATTCA AGTTCAATTAGttgaactCTAAT GCGGG IR1 + IR2 + 15 bp linker + IR1 + IR2 (106 bp) BBa_K4361017
18 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGttgaactCTAATTCA ACTCTAATgATTCAAGT TCAATTAGtt gaactCTAATTCAACTCTAATgATTC AAGT TCAATTAGttgaactCTAATGCGGG (IR1 + IR2) x3 (131 bp) BBa_K4361018
19 CCATAGTTCACTCTAATgATTCAAGT TCAATTAGttgaactCTAATTCAGCGG GTCAGCGGGACTCTAATg ATTCAAGTTCAATTAGttgaactCTA ATTCAGCGGGTCAGCGGG ACTCTAATgATTCAAGT TCAATTAGttgaactCTAATGCGGG (IR1 + IR2) x3 with 15 bp linker (161 bp) BBa_K4361019
20 CCATAGTTCTATCACCgcgGGTGATA TCATATCACCcgcGGTGATAGCGGG Cro repressor variant (51 bp) BBa_K4361020
21 CCCGCACCATAGTTCACTCTAATGAT TCAAGTTCAATTAGTTGAACTCTAAT GCGGG Consensus Sequence found by DTU BBa_K4361021
22 CCCGCACTATAGTTCAGCTAATTGAA CTTGAATCATTAGAGTGAACTAT Strain variant found by DTU BBa_K4361022


References

  1. Pan, Y., Wang, Y., Fuqua, C. and Chen, L. (2013). In vivo analysis of DNA binding and ligand interaction of BlcR, an IclR-type repressor from Agrobacterium tumefaciens. Microbiology, [online] 159(Pt 4), pp.814–822. doi:10.1099/mic.0.065680-0
  2. parts.igem.org. (n.d.). Part:BBa K1758377 - parts.igem.org. [online] Retrieved 8 June 2022. Available at: http://parts.igem.org/Part:BBa_K1758377
  3. Pan, Y., Fiscus, V., Meng, W., Zheng, Z., Zhang, L.-H., Fuqua, C. and Chen, L. (2011). The Agrobacterium tumefaciens Transcription Factor BlcR Is Regulated via Oligomerization. The Journal of Biological Chemistry, [online] 286(23), pp.20431–20440. doi:10.1074/jbc.M110.196154
  4. Molina-Henares, A.J., Krell, T., Eugenia Guazzaroni, M., Segura, A. and Ramos, J.L. (2006). Members of the IclR family of bacterial transcriptional regulators function as activators and/or repressors. FEMS Microbiology Reviews, 30(2), pp.157–186. doi:10.1111/j.1574-6976.2005.00008.
  5. Aditham, A.K., Markin, C.J., Mokhtari, D.A., DelRosso, N. and Fordyce, P.M. (2021). High-Throughput Affinity Measurements of Transcription Factor and DNA Mutations Reveal Affinity and Specificity Determinants. Cell Systems, [online] 12(2), pp.112-127.e11. doi:10.1016/j.cels.2020.11.012
  6. GenScript tool. Available at: https://www.genscript.com/tools/create-scrambled-sequence