Synthetic biology (SynBio) is an approach to develop biological systems which strives to incorporate engineering principles, such as modularity, standardisation and abstraction into its practises. Across the years, the influence of an engineering framework has propelled the field forward, streamlining the manipulation of microbial systems while allowing for technologies to mature or reach the market. [1]
One of the most widely adopted engineering techniques is the Design-Build-Test-Learn cycle, an iterative process of system’s design coupled with optimisation rounds that allows for fast yet effective biological engineering.
In Sporadicate, we strive to adopt and closely follow engineering principles across all stages of our idea development pipeline. During our project, we found ourselves working with engineering of a spore-based system, a research area which was outside the primary focus of our host lab. Here, the engineering design cycle proved to be a pivotal component of our project development, which allowed to navigate and solve the numerous issues we were faced with.
During the initial design phase we relied on a variety of computational tools (Alphafold, Molecular dynamics simulations, and many more!) to inform the construction of a Chitinase display strategy and our take on a self-digesting circuit for biocontainment. Furthermore, not only did we test (and troubleshot) these components, but we have also learnt about their function. Finally, we discovered strengths and weaknesses of the Subtitoolkit and the MoClo collection of parts we have used across our project to engineer bacillus.
With Sporadicate, both computational and laboratory work were linked in continuous yet intimate engineering cycles. In order to showcase how the engineering design-build-test-learn cycle was successfully adopted across our project, we have chosen two examples, one in regards to wet lab, and one for the dry lab.
One of the key components in Sporadicate’s biofungicidal activity are spores displaying chitinase enzymes. These modified spores are capable of breaking down the cell wall of pathogenic fungi, releasing chitin monomers as a consequence.
In order to successfully display enzymes (or any protein) on the coat of B. subtilis spores, a C-terminus fusion with native “anchor” proteins must be established. We have focused on CotG and the chitinase enzyme ChiS from B. pumilis.
Once the sequences and display proteins were selected, two cloning strategies were considered for gene synthesis by IDT. These were the coding sequences of both a direct CotG-ChiS fusion and a CotG-Linker-ChiS design . However, this initial approach proved to be unviable due to the size and manufacturing complexity of the desired fusion proteins, which would have eaten up a significant proportion of our free IDT bases. Moreover, it lacked modularity. Which was important for the characterisation of ChiS activity with different anchor proteins or linkers. Hence, we settled on ordering ChiS as two linear fragments with custom overhangs harnessing type IIs restriction sites for an orderly assembly. We decided to amplify the linkers from the genome via PCR.
Given our intent to try out several linkers to understand whether they could increase chitinase activity, Sporadicate’s cloning strategy involved the amplification of the anchor protein CotG via a high fidelity PCR (Phusion). Notably, the linker sequence was to be introduced at the 3’ end of the CotG CDS via carefully designed primer overhangs. We used benchling to build and test and attempt to verify our primers in silico. Having ordered the in silico designs, we ran amplification reactions for CotG and CotG-Linker.
The PCR products were ran on an agarose gel; while a band for the CotG-Linker was present, the CotG linear fragment failed to be amplified.
The results from this experiment led us to suppose that the primer pair employed for the amplification of CotG had a melting temperature which differed from what was obtained on benchling. Before proceeding further, we tried a touchdown PCR to take annealing temperature differences into account. The results, again, were negative. We then noticed that our CotG primer had a strong tendency to form dimers. It was time to try something else…
Our primer design relied on a common forward oligo for both CotG and CotG-Linker, and reverse sequences which targeted the 3’ end of the CotG anchor and only differed in the presence of a terminal overhang. In order to mitigate the tendency of our CotG primer pair to form dimers, we tested five different concentrations of oligonucleotides (2µM /4 µM /6 µM /8µM /10 µM). Furthermore, we decided to take into account differences in melting temperature and used the autodelta function of our PCR machine, which decreased Tm by one degree after each extension cycle.
We ran a gel with the CotG-Linker as a control. As you can see in Figure 4, CotG was successfully amplified. We gel purified the correct bands and decided to try out a level zero assembly with the Chitinase fragments.
We trasformed the golden gate reaction into E. coli TOP10 cells, and plated onto LB Cm. In order to assess the outcome of the assembly, colonies were screened with colony PCR after which successful clonies were sent for Sanger sequencing.
The trasformation reaction resulted in poor colony number and viability. Out of the limited number of positive clones, none passed our sequencing screening and were hence discarded in further assemblies. Interestingly, most of the mutations we identified arose in the ChiS coding region. After a couple other attempts with poor trasformation efficacy, we hypothesized that the Chitinase linear fragments were the likely cause of the assembly issues, as the CotG sequences appearead unmutated in sequencing rounds.
Given the negative results, it was time to rethink the design and assembly strategy. Our instructor Mo suggested that we try to store the chitinase linear fragments (ChiS_part1 and ChiS_part2) into a “storage” vector. This would serve two purposes: first, it would allow us to cheaply source DNA in the lab via cultivation of a bacterial culture trasformed with the storage plasmids. Secondly, we thought this could solve some of the issues associated with linear fragments, as circular DNA is more stable than linear DNA.
Given that our host lab works with Yarrowia lipolytica, we decided to select YTK001, an entry level vector into the YTK Golden Gate toolkit, as our storage system. To that purpose, we designed new primer pairs for each Chitinase part, with the addition of overhangs for BsmBI to facilitate assembly into YTK001. Notably, every new design was first created and tested in silico (Benchling) before proceeding to laboratory work.
We performed another high fidelity PCR to amplify the Chitinase part 1 and part 2 fragments. Both samples showed intense DNA bands. We extracted the samples from the gel and set up two assemblies into YTK in what effectively was a sub-zero cloning of each chitinase fragment. We trasformed the constructs into E. coli.
The transformation plates showed numerous colonies. We performed a verification PCR and sequenced the positive clones.
The sequencing results showed successful storage of each part into their respective YTK vector. This led us to understand that the STK toolkit was not tailered for the simultaneous assembly of many linear fragments in a one pot reaction, and that our initial design was not optimised for a swift assembly.
Indeed, in Cycle 2 we documented the presence of multiple mutations in the Chitinase region of Level 0 STK clones. Nevertheless, in Cycle 3 we successfully stored each Chitinase fragment into a destination plasmid without reporting any mutations. Hence, we hypothesize that these changes arose at the moment of assembly.
After having successfully secured ChiS part 1 and ChiS part 2 into the storage YTK plasmid, it was now time to proceed to the Level 0 assembly of our protein fusions. At this stage, we planned on testing again the STK toolkit capabilities in a one pot reaction, hoping to see improvements due to the novel chitinase storage system. Importantly, every assembly had been performed in silico during the design stage of cycle 3 to assess the efficacy of our storage system.
A Golden Gate reaction was performed overnight with both CotG and the CotG-Linker anchor sequences. The assembled products were again trasformed into E. coli cells and selection occurred on LB Cm.
When compared to our initial attempt, the transformation plates displayed numerous colonies with a few fluorescing. We performed Colony PCR and restriction digestion with BsaI. We obtained successful clones and sent them for sequencing.
This engineering cycle allowed us to look back at our initial design choices and understand their limitations; in particular, we will avoid linear fragments in STK Golden Gate reactions, as they are prone to instability. The engineering principles we adopted across this learning process allowed us to pinpoint and understand the limitations of the toolkit we worked with, generating useful feedback for its implementation and use for future igem teams and researchers.
Computational tools have proven invaluable for fast, efficient and high throughput microbial engineering.
With Sporadicate, we strive to engineer spores with germination capabilities orthogonal to natural signalling pathways. In order to achieve this, strong computational components were coupled and intrinsically linked to our laboratory work. As we progressed through our project, we came to the realisation that each subcomponent (wet and dry lab) experienced DBTL improvement cycles.
Having introduced a wet lab example of engineering success, we now discuss the narrowing of the mutation search space for the germinant receptor GerA. We decided to include this example to further elevate computational modelling’s role as a companion rather than an accessory to the wet lab project.
GerA is a multidimeric complex which consists of 1000 residues, spread out in three protein domains (GerAA, GerAB, GerAC). Notably, this receptor naturally recognises L-alanine (the germinant) and, as a response, activates the germination process. We were interested in altering GerA’s specificity towards N-acetylglucosamine (NAG) - i.e. the monomer of chitin. Our goal was to identify the key amino acids in germinant recognition - and hence the initiation of germination. Indeed, exploring massive mutational spaces would be both impractical and inefficient; for reference, the entire GerA mutational space spans 201000 possible combinations.
In order to repurpose GerA, restricting the analysis to the binding region to prevent getting lost in the combinatorial space. Hence, our computational team aimed to identify which of the three GerA subunits was involved in binding. To that end, we combined L-alanine affinity measures with works presented in the literature (pointing to the GerAA or GerAB subunits) to identify which subunit was involved in binding.
We created models of the complexes from the whole Ger receptor family of proteins using Alphafold Multimer and analysed them using ligand binding tools.
The outcome of the affinity calculations resulted in the identification of binding pockets in the B subunit which were compared with the literature.
This preliminary analysis allowed us to reduce the mutational space to the GerAB subunit of the receptor. The combinations to explore decreased from 201220 to 20365 (the size of the AB subunit).
Having pinpointed the subunit containing the putative binding pocket for L-alanine in the GerA complex, we tried to identify residues of interest within this region to further restrict the combinatorial space and gain an understanding of the germination cascade. Indeed, exploring 20365 mutants still represented a massive scale problem.
Key amino acids were pinpointed by combining MSA, structural alignments and NAG binding affinities (AutoDock). Furthermore, CaverDock was used to understand the function of residues in the opening to the binding pocket.
A variety of parameters were taken into account for amino acid scoring. These include residue conservation across receptor sequences, binding affinities and TMalign RMSSD scores.
After this engineering learning stage, we were able to select residues of interest for increased NAG affinity. The combinatorial space transitioned from 20365 → 2020.
Having reduced the amino acids to we wanted to mutate it was now time to lay out the work for the wet lab experiments. We considered TWIST bioscience to order a combinatorial variant library. Notably, their service offers tailorable variance, from around 105 to 108 different mutants, as well as AA bias. Hence, a size of 2020 was still too large. Our next objective was to get variants closer to this scale by increasing binding affinity for NAG.
Upon chatting with Luke Yates at Imperial College London, we constrained the combinatorial space to sequences thought to increase NAG binding affinity. This occured by considering small, low-hindrance residues instead of bulkier amino acids and by favouring hydrophobic residues in the binding pocket. An inspection of channel residues was also performed to identify bottlenecks in NAG’s route towards the binding pocket.
As a consequence, we identified bottleneck residues with peaks in the energy profile, limited their mutations to alanine or smaller versions of the original residue. Binding site residues were mutated to polar and hydrophobic residues such as asparagine, tyrosine, and tryptophan - which are thought to favour the interaction of NAG.
At the end of this engineering cycle, our team shortlisted amino acid residues thought to impact NAG binding.
Across the previous three engineering design cycles we managed to successfully limit the regions to mutate in the GerA receptor. Nevertheless, the final complexity we obtained still underlined an extremely high number of possibilities. Moreover, we realised that our approach was time-consuming, it takes a long time to manually generate structures of the mutants and then individually assess their binding affinity using AutoDock Vina.
Thus we developed a pipeline that efficiently traverses the combinatorial space of mutants to screen for the highest in silico binding affinities.
We used parallel computing and took subsections of the possible mutants to speed up runtime. A scoring function similar to AutoDock Vina was used.
Decoys were utilised to verify the pipeline, to compare our scoring with known ligands.
We managed to create a framework to reengineer proteins to new ligand specificity by optimising for affinity. While the resulting top scoring mutants from the framework should not be directly screened in the lab, the reuslts may provide generalised knowledge about optimal binding motifs and can aid rational design and
[1] Petzold, C., Chan, L., Nhan, M. and Adams, P., 2022. Analytics for Metabolic Engineering.