Models

Overview


To inform the wet-lab and support our overall project, our dry-lab team focused their efforts on four subprojects:

  1. To confirm the rationale behind our gene circuit designs, we performed RNA-seq analysis to determine differentially expressed genes and differentially regulated pathways in heat and drought-stressed T. aestivum wheat.

  2. To better understand our genetically engineered system, we developed mathematical models for predicting heat-inducible gene expression and enzyme kinetics for our reactions of interest.

  3. To increase the stability of our enzymes under heat stress, we performed preliminary molecular modelling using PyMOL and PyRosetta to identify active site mutations that would lead to optimal enzyme thermodynamic stability.

  4. To aid the proposed implementation of our project in the field, we designed and tested a portable Arduino-based fluorometer device to detect heat stress in the wheat through fluorescence activation of our engineered gene circuits.

Dry-Lab Overview


Bioinformatics Analysis


Transcriptome Analysis of Heat and Drought-Stressed Wheat:

In order to better understand how abiotic stress conditions affect T. aestivum wheat, we analyzed RNA-seq data of 1-week old wheat seedling leaves subjected to drought stress (20% PEG-6000), heat stress (40°C), and their combination before (0h, control) and after stress treatments (1h or 6h).

We discovered and downloaded the publicly available RNA-seq dataset in the form of a raw count matrix from http://www.wheat-expression.com/, a meta-analysis platform that integrates transcriptomics data from many T. aestivum wheat studies and allows you to compare the expression levels of specific genes across different conditions[1]. For more details and guidance on how to use the Wheat Expression Browser tool, we referred to their GitHub tutorial.

User Interface of Wheat Expression Browser
User Interface of Wheat Expression Browser


Example Use of the Wheat Expression Browser Tool
Example Use Case: Comparing the Average Expression Levels of the SBPase Gene Homoeologs Across Different Sample Conditions

Comparative Analysis:

To better understand and compare the different molecular effects of heat vs. drought stress in wheat, we performed a comparative analysis of the transcriptome profiles of the wheat seedling leaves under drought stress, heat stress, and combined stress conditions using a normalized (TPM - Transcripts Per Million) expression count matrix.

2D PCA Plot of Samples
2D Principal Component Analysis (PCA) with the Proportion of Variance Explained by Each PC

3D PCA Plot of Samples
3D Principal Component Analysis (PCA) with the Proportion of Variance Explained by Each PC

In the dimensionality reduction plots above, the more similar datapoints (ie. samples) are in terms of their gene expression profiles, the closer they are located in space. From these PCA plots, we observe very high similiarity between the biological replicates as expected, and also note these four distinct groups of sample conditions clustered together based on similarity: heat and combined stress (1h treatment), heat and combined stress (6h treatment), control (untreated) and drought stress (1h treatment), and drought stress (6h treatment).

These results suggest that the mRNA (ie. gene expression) profiles of drought-stressed wheat differ significantly from that of heat or combined stress, and they show complex relationships depending on the length of stress treatment (as seen by the distinctly separate 1h vs. 6h drought stress samples).

Differential Gene Expression Analysis:

Next, we performed statistical testing between the gene expression counts of different sample conditions in order to determine significant differentially expressed genes. As intense heat and drought conditions often coincide in the real world, as was pointed out by several agricultural experts and farmers we spoke to for our Integrated Human Practices work, we mainly focused on comparing the combined (heat and drought), long-term (6h treatment) stress condition to the untreated control condition.

Using the edgeR software package, we identified 2079 differentially expressed genes (890 upregulated and 1189 downregulated) in wheat seedling leaves in the long-term combined stress condition (HD6h) compared to the control (using extremely stringent filtering conditions of fold change ≥ 32 and false discovery rate (FDR)-adjusted P < 0.001).


Screenshot of an interactive volcano plot where each datapoint represents a gene. On hover, the gene name is displayed.

As an example and sanity check, the highly upregulated gene from the plot above was searched and revealed to be a heat shock chaperone protein.

Gene Ontology (GO) and Plant Reactome Analysis:

To functionally characterize the differentially expressed genes and understand their effect on the wheat plant in a biological context, we used the biomaRt software package to look at the top (ie. most frequent) Gene Ontology terms associated with the upregulated or downregulated genes, as well as look at their Plant Reactome reactions and pathways.


The top GO Molecular Functions, Biological Processes, and Cellular Components associated with the upregulated genes in the HD6h condition. We observe many terms related to heat shock and chaperone proteins (ex: "response to heat", "protein folding", "unfolded protein binding"), regulation of gene expression (ex: "regulation of transcription, DNA-templated", "DNA-binding transcription factor activity"), and osmolarity (ex: "response to salt stress").


The top GO Molecular Functions, Biological Processes, and Cellular Components associated with the downregulated genes in the HD6h condition. For example, we observe many terms related to metabolic processes such as photosynthesis (ex: "photosynthesis, light-harvesting", "photosystems I and II", "chloroplast thylakoid membrane", "carbohydrate metabolic process").


Some examples of the top (ie. most frequent) Plant Reactome reactions and pathways amongst the differentially expressed genes in the HD6h condition include "1-aminocyclopropane-1-carboxylate synthase" (18 upregulated genes), "Ethylene biosynthesis from methionine" (38 upregulated genes), "Primary root development" (106 downregulated genes), and different hormone signalling mechanisms (ex: jasmonic acid, abscisic acid). These results demonstrate the overlap between the differentially expressed genes in this analysis, and the genes involved with the reactions and processes of interest in our project.

For example, one of our genetic circuits' will express the enzyme ACC deaminase, which breaks down the precursor to the ethylene stress molecule (ACC ie. 1-aminocyclopropane-1-carboxylate). Ethylene can prove harmful to the plant in high concentrations, leading to negative effects such as stunted plant growth and root development.

Comparing Different Abiotic Stress Conditions:

To expand on the comparative analysis performed earlier, we plotted Venn diagrams to see the overlap of up- or downregulated genes in response to the three abiotic stress conditions: HD6h, H6h, D6h, in order to see just how distinct or similar the molecular effect of heat vs. drought stress is on the wheat plant.

Number of overlapping differentially expressed genes between the D6h, H6h, and HD6h stressed sample conditions when compared to an untreated control as a baseline. Like the comparative analysis performed earlier, we observe very high overlap between the heat and combined stress samples, with the drought stress having a significantly higher number of unique differentially expressed genes.

Conclusion:

A future step in our bioinformatics analysis will be to perform gene set enrichment analysis (GSEA) to see if specific, user-defined sets of genes are differentially expressed in the HD6h compared to the untreated control. Within the context of our project, we would like to investigate the AP2/ERF superfamily genes (as they are downstream gene targets of ethylene signalling), the genes encoding alpha-amylase and starch synthase (to better understand the starch content and nutritional value of stressed wheat), and the genes encoding Calvin Cycle enzymes (like SBPase), amongst other possibilities.

For our complete analysis, results, and scripts, please refer to our centralized GitHub repository.



Mathematical Modelling


Overview:

The purpose of mathematical modelling was to gain insight into the mechanics of our gene circuits, and identify the most critical parameters affecting the production of our target enzymes (ACC deaminase, SBPase, and choline monooxygenase) and their respective metabolic cycles (ethylene regulation, the Calvin Cycle, and glycine betaine synthesis). The team formulated a system of ordinary differential equations (ODEs) modelling the genetic circuit using Mass-Action & Michaelis-Menten Kinetics, as well as conducted a sensitivity analysis through MATLAB SimBiology.

Modelling Transcription Factor Activity:

SimBiology Model of the Promoter-Transcription Factor System
MATLAB SimBiology Model of the Promoter-Transcription Factor System, Based on Hill Kinetics

All three of our genetic constructs are built using the same heat-inducible TaHsp70d promoter which binds to the TaHsfA2b transcription factor. During our literature review, we discovered the work of Rieger and colleagues[2] who modeled the Hsp70 promoter and transcription factor activation pathway, providing us with most of the assumptions used in our own mathematical models. A visual from their paper summarizes the pathway efficiently and is presented below:

Hsp70 Promoter and Transcription Factor Pathway
Annotated steps of heat shock protein (HSP) expression and regulation. X:Y denotes X bound to Y. Solid lines indicate mass flow or chemical reactions, and dashed lines indicate regulatory interactions. Heat shock, or temperature (T), enters the model through switching the stress-dependent kinase (S) from its inactive to active form (S*) (1). The stress kinase is inactivated by dephosphorylation back to its inactive form (2). The transcription factor (HSF) binds to the promoter site (HSE) (3), where it is bound by the active stress kinase (4) and is phosphorylated to its active form (P:HSF:HSE) (5), that induces transcription (6) and translation (7). HSP binds to the active form, repressing transcription (8). The inactive form is subject to binding (9) dephosphorylation (10) by the inactivating phosphatase (I). HSP also binds HSF on the HSE, before it is phosphorylated (11), or off the DNA HSP binds and sequesters HSF in solution (HSF:HSP) (12, 13). The mRNA is assumed to be stabilized by S* (14), but mRNA and HSP still turn over via first-order decay (15, 16).

Using assumptions stated in the paper, we generated a system of ODEs modelling the production of the promoter-transcription factor complex as a function of transcription factor concentrations, in a typical Hill-Kinetics format. The assembly is a first-order process[2], yielding the following equation:

Equation 1
[PTF] is the promoter-transcription factor complex concentration, Cn is the plasmid copy number, Kd is the dissociation constant, and [THA2] is the transcription factor concentration.

This model was simulated in MATLAB Simbiology and the resulting expression profile is presented below:

Expression Profile of the Promoter-Transcription Factor System
Expression profile of the system’s promoter, transcription factor, and promoter-transcription factor complex. The promoter and transcription factor plotlines are overlaid.

Our attempt at finding a Kd value for the transcription factor through literature review was unsuccessful. It is also quite difficult and expensive to determine through wet-lab techniques, and the team was unable to find software with sufficient accuracy to predict the dissociation constant values. Hence, the Kd value from a homologous protein was used in our model instead. This choice is justified by the relatively low dependence of the [PTF] on the Kd value, as visualized by a SimBiology Global Variance Analysis simulation, presented below:

Global Variance Analysis of the Transcription Factor Model
Global Variance Analysis of the Transcription Factor Model

Global Variance Analysis runs a set of simulations where each parameter is varied simultaneously within predefined parameter spaces, and the resulting variances in a specified output are calculated and stored in a matrix. It then calculates Sobol Indices, a set of values describing the contributions of each parameter to the specified output. Specifically, the first-order indices measure the direct fractional contribution of one parameter to the specified output, while the total order indices take into account higher order indices as well (like interaction between different parameters)[3]. The color gradient of each bar represents the clustering of results from each simulation.

The sensitivity simulation presented above computed sobol indices over 1000 models, with uniform variation of each parameter within 50% of the user-defined value. It is apparent that the Kd value does not have a significant direct effect on the output promoter-transcription factor complex concentration. As all our constructs use the same promoter and transcription factor in the same conditions, this assumption and its corresponding equation can be used in later derivations throughout the project.

Modelling SBPase Production And Effect on the Calvin Cycle:

SimBiology Model of SBPase Expression, Based on Hill and Michaelis-Menten Kinetics

Our first comprehensive model looked at the expression and production of the SBPase enzyme. Using guidelines from the University of Washington[4], the team generated a standard Michaelis-Menten model. We worked with the Briggs-Haldane derivation, and inherited its three most critical assumptions:

  1. Pseudo-Steady State Approximation: States that after an initial burst phase, enzyme-substrate concentrations will reach a pseudo-steady state at low concentrations, due to differences in reaction rates.

  2. Rapid Equilibrium Approximation: Stems from the original Michaelis-Menten assumption that enzyme-substrate binding and dissociation is faster than product formation.

  3. Free Ligand Approximation: States that the concentrations of the substrate in our ODE can be approximated as the total concentration of the substrate present in the system. This assumption holds as long as the SBPase concentration is well below the KM value of the system.

This leads to the following equation:

Equation 2

In this equation, [SBP] technically only represents the free sedoheptulose 1,7 substrate in the system, but by using the Briggs-Haldane free ligand approximation, the [SBP] can be approximated as the total [SBP] present in the system, as long as the total enzyme concentration is well below the Michaelis-Menten constant. Simulating this model on SimBiology yielded the expression profile below:


To better understand the impact of individual parameters on this model, we conducted parameter sweeps, and the results are displayed below:

Left - Global Variance Analysis of All Reaction Rates Involved in the Model. Right - Global Variance Analysis of Species Not Directly Interacting with Sedoheptulose 7-Phosphate (SB7).

The left plot reveals that none of the forward reaction rates have a significant direct or indirect contribution to the accumulation of SB7 over a run time of 20 minutes (time taken for the model to reach steady-state). Only the degradation rate of the molecule itself had a strong influence, which was expected. The right plot reveals some more interesting insights into the model, highlighting SBPase mRNA and the transcription factor to have strong indirect/compounding influences on SB7 production (indicated by the strong clustering of total order sobol indices at value 1). The promoter has a moderate indirect influence on the SB7 concentrations. These values are justified once we realize that each of these species is directly responsible for SBPase production, which is proportional to SB7 production.

As a proof of concept, the team attempted to connect our SBPase production pathway to a validated, reviewed model of the Calvin Cycle from a scientific publication[5]. This model was downloaded as an .SBML file and opened in SimBiology, to connect it with our synthetic SBP regulation model, as presented below:

SBPase Production and Calvin Cycle Pathway

We also conducted global variance analysis to visualize the potential effects the synthetic pathway would have on the starch production of the imported Calvin-Benson Cycle, and presented it below:

Global Sensitivity Analysis Demonstrating the Effect of Transcription Factor, Promoter, and SBPase concentrations on Starch Production

Modeling the ACCD-Ethylene Pathway:

Once again, we reuse the transcription factor (Hill Kinetics) portion of the derivation to generate the same equation for the promoter-transcription factor complex. Furthermore, since we continue to use the Briggs-Haldane derivation of Michaelis-Menten kinetics, we obtain the same expression for the Vmax of our new enzyme system, with different species:

Equation 3

Things change as we attempt to model the relationship with ethylene (also named ethene)and its precursor, ACC, in an attempt to link ethylene regulation to promoter activity. ACC participates in a Ping-Pong-Bi-Bi reaction with the ACC oxidase enzyme[6] to produce ethylene according to the following reaction scheme:

Ethylene (Ethene) Reaction Scheme

Modeling the reaction scheme above, we realize that this is a three substrate reaction, and hence will be a variation of the Michaelis-Menten kinetics used so far[7]. Presented below is a variation of the Briggs-Haldane derivation with multi-substrate enzyme kinetics. It has been assumed that the ACC oxidase enzyme is plentiful, the substrates are non-inhibitory, initial reaction kinetics are being observed (no product buildup), and reactant concentrations are initially very high:

Equation 4

KL-asc, KACC, KO2 are the second-order rate constants, or specificity constants. The apparent second-order rate constants of each of the competing substrates dictates the partitioning between competing reactions, and is the rationale behind the name specificity constant. ACCO is the initial ACC oxidase enzyme concentration. Lastly, K0 is the first-order rate constant of the ACC oxidase enzyme, displaying zero-order kinetics at high substrate concentration. The first-order rate constant is a measure of the catalytic potential of the ACC oxidase enzyme, and is called the catalytic constant. Knet refers to the combined rate constant in the event of all substrates binding together (traditionally written as KL-asc-ACC-O2).

It must be noted that the reaction velocity V3 in this reaction is not the maximum velocity from traditional Michaelis-Menten models, but rather called the limiting rate of the reaction.

The rationale behind the 0 subscript is similar to that in previous Michaelis-Menten models: the total enzyme concentration is typically considered the enzyme concentration when time t = 0 (ie. at the instant of mixing). In this time frame (initial phase of the reaction), it can be assumed that the product concentrations are not substantial, causing the product concentration terms from the above equation to disappear. This leads to the following simplified expression:

Equation 5

Knet is a term accounting for the possibility of all three substrates successfully docking with the ACC oxidase enzyme at the same time. As the likelihood of this happening is negligible, this term too can be removed to simplify the equation. It is also convention to replace the other enzyme constants with their reciprocals with the same subscripts and superscripts - which are called Dalziel Coefficients. This leads to the following further simplified expression:

Equation 6

As MATLAB SimBiology did not have a pre-built multi-substrate kinetics function, it was manually encoded as an "Undefined reaction". Modeling this system in SimBiology yields the following expression profile:

Expression Profile of the ACCD System
Dimensionless Expression Profile of Molecules Against Time

Following this, the team conducted sensitivity analysis of the expression system, measuring the sensitivity of ethylene concentration against varying reaction rates and species. This produced the following Sobol plots:

Sensitivity Analysis of Ethylene
Left - Global Variance Analysis of Different Reaction Rates Against Ethylene Sensitivity. Right - Global Variance Analysis of Different Species Against Ethylene Sensitivity.

Once again, we see that none of the reaction rates have a very strong direct effect on ethylene production, except the Michaelis-Menten constants for ACC deaminase and ACC oxidase. It is reasonable for the KMb value (Michaelis-Menten constant for ACC oxidase) to score higher as it is the governing reaction rate of ethylene production. Among individual species, it is justified that the highest scoring species are ACC (the precursor molecule to ethylene) and ACC oxidase (enzyme responsible for ethene production).

Conclusion:

Through our mathematical modelling work, the team was able to gain meaningful insight into our gene circuits' enzyme kinetics and identify the most important parameters affecting each of the simulated models. Furthermore, the team demonstrated the SBPase circuit’s viability by connecting it to the published .SBML model of the Calvin Benson Cycle as a proof of concept.

Our work can be furthered by an exploration of the effects of ACC deaminase production on the Yang Cycle, which is analogous to the work conducted with the Calvin-Benson Cycle. However, the team was unable to find a working model of the Yang Cycle from literature, and did not have the time to calculate parameters required to create a model from scratch.

Mathematical analysis can also be applied to further characterize the operating range of our system, by measuring changes to enzyme expression (using fluorescence output as a proxy) as a function of temperature. Although UBC iGEM attempted to do so with their current experimental data, we were not able to collect a sufficient number of tested datapoints over a large temperature range due to time restrictions. An incomplete quadratic function modelling the temperature inducibility of the promoter has been included below. Analysis was completed on Logger Pro.

A poorly fitting quadratic function. Additional temperatures need to be tested to increase resolution of the data.

The Gaussian function is predicting the ideal domain needed to fit a parabolic function. Future experiments should collect temperature readings between 20°C and 50°C.

For full equation derivations, MATLAB Simbiology files, and further details, please refer to this central GitHub repository where we have uploaded all of our mathematical modelling work and documentation to allow for exploration and reproducibility.



Molecular Modelling


The team conducted preliminary molecular modeling in an attempt to improve enzyme thermodynamic stability. Our overall project was motivated by the desire to create a heat-inducible, genetically-modified wheat system that would be resilient to climate change and heat waves. Although the gene circuits express enzymes that would degrade plant stress molecules (such as ethylene) and upregulate productive cycles (like the Calvin Cycle) - these enzymes themselves would still degrade (misfold) once certain temperatures are exceeded, and thus they require significant biological upkeep.

Hence, we conducted protein optimization simulations to improve the thermodynamic stability of our enzymes of interest, which in theory optimize their protein structures through favorable amino acid mutations that increase their tolerance to elevated temperatures. We accomplished this by visualizing enzymes and identifying their active sites on PyMOL, conducting point mutation scans on PyRosetta, and identifying optimal mutations using an R script.

Visualizing Molecules and Identifying Active Sites:

The workflow begins with the visualization of the three enzymes and the TaHsfA2b transcription factor on PyMOL. As choline monooxygenase and the TaHsfA2b transcription factor did not have crystallography structures in the Protein Data Bank (PDB), the team had to rely on AlphaFold predictions to visualize these structures. AlphaFold is a program that is able to predict the 3D structure of a protein from its amino acid sequence with high accuracy through homology modeling and simulations run to minimize the free energy of different protein conformations.

In the crystallized structures for ACC deaminase and SBPase, active sites were identified by selecting 3D regions within 3.5 Angstroms of the enzyme ligands and cofactors, and visualizing polar contacts and water molecules within these regions. The amino acid residues corresponding to these selections were stored as target residues, and later used in mutational analysis. These active sites are presented below:

ACCD Enzyme Active Site
PyMOL visualization showing the active site of ACC deaminase, water molecules (red), ACC (cyan), and the imidazole cofactor (yellow).

In AlphaFold predicted structures (TaHsfA2b and CMO), homologous proteins were identified and their sequences were aligned to find highly conserved regions - these residues were stored as active site candidates. Further inspection in PyMOL was used to finalize successful binding site candidates. Examples of accepted and rejected active site candidates for choline monooxygenase are presented below:

Rejected CMO Candidate Active Site
Accepted CMO Candidate Active Site
Top - The accepted candidate active site structure for choline monooxygenase. In the accepted conformation, the protein adopts an intrinsically disordered pattern (IDR) with more residues available for binding with a substrate. IDRs are flexible and allow proteins to bind and dissociate rapidly from substrates. Bottom - One of the rejected candidate active site structures for choline monooxygenase. The rejected structure has many more steric clashes between amino acid residues, and between residues and the peptide backbone.

Quantifying the Effects of Mutations Using PyRosetta, CUPSAT, and R Scripts:

Having identified target amino acid residues for a mutational scan, the team used PyRosetta’s Point Mutation protocol to compute all possible mutations to every amino acid present on the target residues, before using PyRosetta’s scoring function to rank them based on their predicted ΔΔG scores - finding us the effect of each mutation on the proteins' thermodynamic stability. Following this, a simulation was run on CUPSAT, a web tool that can predict the effects of point mutations to protein structural stability by calculating changes to Ramachandran angles[8-10].

Then, using a custom R script, we filtered results to only include thermally stabilizing mutations that had favorable Ramachandran angles. Following this, we visualized mutational profiles showing the stabilizing effect of said mutations at given target residues. These plots are presented below:


Finally, the most thermodynamically stabilizing mutations were identified for each protein and the top results are presented in the table below:

Protein of Interest Protein Chain Wild Type Amino Acid Residue ID Mutant Amino Acid Predicted -ΔΔG (kcal/mol)
ACC deaminase C LEU 322 LYS 9.09
choline monooxygenase A CYS 162 LYS 6.84
SBPase A GLU 376 LEU 6.28
TaHsfA2b A GLU 112 ILE 5.78

Analyzing Mutation Impacts on Protein-Ligand Docking Using Chimera and HDock:

Our goal was to compute protein-ligand docking scores of all mutated protein structures to find the optimal mutation at the active site that would:

  • increase protein stability at elevated temperatures
  • increase binding affinity to ligands at elevated temperatures
We will be working on completing the second objective above, by creating PDB structures for PyRosetta's predicted active site mutations, and calculating their docking scores on HDock.

However, this workflow was realized to be computationally expensive and time-consuming, so our team was only able to compute docking energies for interactions between the TaHsp70d promoter and TaHsfA2b transcription factor. A PDB structure of the TaHsp70d promoter sequence was created in PyMOL using a script provided from the Supercomputing Facility for Bioinformatics and Computational Biology[11].

PDB structures of the mutated transcription factor TaHsfA2b proteins were created manually on UCSF Chimera using the ‘Rotamer’ function, which would predict the most probable conformation of the target mutation. This process was repeated 10 times to create 10 mutated PDB structures (ie. models) of the transcription factor.

Lastly, these structures were docked using Hdock, a web tool specializing in calculating Protein-DNA docking scores. A more negative docking score means a more possible binding model, but the score should not be treated as the absolute true binding affinity of two molecules because it has not been calibrated to the experimental data. A table of corresponding docking scores is presented below:

Hdock Results
Protein-DNA Docking Score Results From Hdock

From the table above, we observe that Model 2 has the highest-ranked docking score as well as highest confidence score, while Model 6 has the lowest-ranked docking score. Both observations also make logical sense when observing their visualizations in PyMOL.

Model 6
Model 6 - Barely any contact between the DNA promoter sequence (TaHsp70d) and the transcription factor protein (TaHsfA2b).

Model 2
Model 2 - Significant contact between the DNA promoter sequence (TaHsp70d) and the transcription factor protein (TaHsfA2b).

Model Number Protein Chain Wild Type Amino Acid Residue ID Mutant Amino Acid Predicted -ΔΔG (kcal/mol)
Model 2 A TRP 111 ILE 5.12
Model 6 A ASP 109 HIS 2.34

Protein-DNA and protein-protein docking simulations are very computationally expensive and time-consuming, and were the reason we were unable to apply this analysis to our three enzymes of interest. In the future, we plan on leveraging high-performance computing power to complete this portion of our analysis.

Another future direction would be to consider generating and testing combinations of multiple mutated amino acids within the active site, as we only considered single point mutations at a time in our current mutational scan and protein optimization process, even though realistically multiple interacting mutations would be required in order to achieve the optimal thermodynamic stability.

For the full details, files, and custom scripts used in our molecular modelling work, please refer to this central GitHub repository where we have uploaded complete documentation of the project to allow for exploration and reproducibility.



Hardware


Concept Design:

The accompanying hardware device to our genetic constructs is a portable Arduino-based fluorometer that detects the GFP (or other fluorescent protein) activity out in the field. The device would recreate plate reader environments on a leaf of the wheat plant to check construct activity. This process would require three components:

  • light excitation at the desired wavelength
  • light emission detection
  • low-light conditions and filtering to maintain a high signal-to-noise ratio

Overall Hardware Device Concept
First Concept Sketch of the Hardware Device

Our first conceptual sketch above highlights the overall device structure and the main components. The initial design featured a polycarbonate film that would help filter out wavelengths from the UV LED we originally planned to use. However, once we determined a different excitation source and filter combination, the polycarbonate film was unnecessary and the idea discarded.

Device Principle of Operation
High-level Illustration of Device Principle of Operation


Conceptual Circuit Design
Concept Sketch of Circuit Design

The internal view of the concept above outlines the rough compartmentalization and location of circuitry components. The original and ideal final design will be battery-powered for portability. However, the current prototype will connect to a laptop via USB cable for simplicity and ease of frequent testing.

Plant Leaf Fitting

The simple sketch above demonstrates how we want the device to fit with a wheat plant leaf and it gives a general idea of how large the device will be.

Circuit Design:

Our team’s solution was designed and engineered with the applicability, versatility, and accessibility in mind, creating a fluorescence-detecting circuit using low-cost and easily manufacturable parts. The electrical circuit itself took inspiration from common lab equipment found on most bench tops. The leading idea was to create an in-situ fluorometer to help quantify the level of heat-induced gene expression from the plant tissue sample.

Fluorometric Detection of Heat Stress Response:

The first step in our process was determining a suitable system for macroscopic viewing of GFP in plant tissue. Identifying GFP expression in plants is particularly challenging due to the photometric response of chlorophyll in plant cells; namely, chlorophyll in plant cells is excited by blue light, which is the excitation wavelength for the GFP we used, and emits red light! This creates interference and noise in the signal we hope to acquire from the GFP light emission. Resolving this issue would require filtering either the excitation light or the emission light prior to photodetection. In our use case, we filtered prior to detecting the emission light since the excitation light will excite both GFP and chlorophyll.

Excitation and Emission Spectra of GFP

Our goal was to excite GFP at a wavelength of 485 nm (λEx) and detect the emission wavelength at 510 nm (λEm). However, it was also desirable to be able to adjust the specific excitation wavelength for detecting different fluorescent proteins (ex: RFP, BFP, YFP which are use in some of our plasmids) as well as general troubleshooting, which is especially important for something as sensitive as photometry.

In order to actually detect the fluorescence of GFP, we had to select the appropriate photodetection device. There are many photodetectors available on the market, however, for our specific use case, we required one with the ability to distinguish wavelengths. We had the following choices for photodetectors: photoresistors, phototransistors, and photodiodes.


While not the most obvious choice, the most appropriate photodetector was the photodiode due to its ability to induce current as a result of photon energy (which is a function of its wavelength). Photoresistors were not chosen since they are not very sensitive or accurate in terms of their change in resistance due to photon incidence. There are many other advantages to using photodiodes in this specific use case, such as: being able to run them in forward/reverse bias, easily encoding the information into digital data, and detecting weak electromagnetic signals. However, the main idea was that depending on the magnitude of current, we could infer something about the photonic energy delivered to it.

Hence, we have devised a general system to detect heat stress response by leveraging GFP as a proxy and quantifying the level of heat stress response as a function of fluorescence. Our overall block diagram would look like this:

High-level Block Diagram of Electrical Circuit and Interface with Hardware

Optical Physics and Photography:

One major challenge that we encountered when developing a circuit and system for detecting fluorescence in plant tissues was ensuring that the emission wavelengths were that of GFP and not of other plant organelles.

The first step was determining a device to excite the GFP, and inadvertently the chlorophyll as well. We had decided that a standard LED was the most appropriate choice in components, however we figured that using a RGB LED would allow for increased flexibility and make the device more robust. An RGB LED would allow us to configure the LED excitation wavelength to any desired value, hence making the device more robust. We purchased a common anode RGB LED and configured it to emit light at the exact wavelength desired, meaning a color value of rgb(0, 234, 255).

The next step was deciding what type of optical filter would be best for filtering all wavelengths outside of the desired emission wavelength. Our team's first attempt at this was using a photography light filter! This choice was made in the spirit of keeping our component cost to be relatively low cost and accessible. We decided to test both the RGB LED and the filter apparatus in order to assess and iterate on our design. We used GFP-transformed DH5α E. coli to test the excitation of GFP, as an initial proof of concept before testing on plant tissue expressing fluorescent proteins. We used three cell cultures: RFP-expressing cultures, GFP-expressing cultures, and no fluorescence expressing cell cultures. This was meant to test GFP excitation as well as RFP excitation which is another fluorescent protein that our hardware would aim to detect, given that it is expressed with one of our plasmids. Furthermore, we tested the photography light filters with a laboratory grade bandpass filter meant for viewing fluorescence.


The results of this testing demonstrated that the photography light filters were too weak to isolate the desired emission wavelength and were unsatisfactory for the device design. Upon reiteration, our team decided to change directions towards using a more sophisticated bandpass filter lens designed for fluorescence detection. We justified this design choice since lower cost components were failing to meet our product requirements. We were unable to physically test with that lens since lead times were too long. Given the inability to transform whole plants with GFP, we were also unable to test that the excitation was from GFP and not other plant organelles, since we only tested the filter on E. coli bacteria. However, through this approach we were still able to obtain a first look on how well our RGB LED in combination with the filter was able to obtain our specific emission wavelength.

We concluded that a more sophisticated and better performing light filter lens was required. This is due to the fact that the emitted light was not being adequately detected, which suggests that a stronger filter is required.

Signal Acquisition and On-Board Processing:

The signal acquisition portion of the circuit is primarily done by the photodiode, however, we can break this entire acquisition process into smaller parts to help guide our circuit design process. The photodiode diode results in an amperometric response as a result of light detection. This response must be converted to voltage so that we can perform signal modifications as needed. Since the electromagnetic/optical signal from the GFP emission is likely to be detected as a small current change, we must also amplify the response for noticeable detections. Moreover, we will likely rectify the signal to ensure that our microprocessor can obtain appropriate and accurate readings. Thus, we gain a circuit block diagram that looks as follows:

Circuit Block Diagram of the Necessary Signal Processing Steps

With a better understanding of how we want the signal to progress throughout our circuit, we can begin to sketch up some designs. After various rounds of sketching, reviewing literature, and understanding the current market designs for fluorometers, our team decided to base our circuit design on a paper focusing on DNA detection[12].



Electrical Schematic of Whole Device Circuit

Full circuit diagram using KiCAD, an electrical CAD software. CAD softwares are useful tools for electrical and physical design that allow for the creation of 2D and 3D drawings to be prototyped. There are a myriad different CAD softwares depending on the type of design, and KiCAD was chosen among multiple other electrical design CAD softwares available, because it is not only an industry standard tool, but is also open-source software meaning that it is publicly free to use.


We have adjusted the resistance and capacitance values for the individual parts as well as including a variable resistor/potentiometer to allow for adjusting the gain of the amplifier. This design choice lets us adjust the degree of amplification depending on the optical signal strength; so for weaker signals, we can increase the gain and acquire more sensitive data. Moreover, after amplification, we needed to adjust the signal so that it could be read by the Arduino microcontroller, hence we included a clipping circuit as a rectifying step to ensure that the signal values would not exceed 5V. This design choice future proofs the circuit as potential expansions including an analog-to-digital convert (ADC) could be included without damaging the component.

Prior to prototyping the circuit, we decided to use a virtual simulation to view how the circuit would perform. This would allow us to gain early feedback and reiterate on the design as needed. Using the MultiSim SPICE emulation software, the following circuit was produced with voltage probes on the colour indicated positions:

MultiSim Circuit Simulation of Whole Device

By perturbing or varying the photodiode current source (BPW37), we obtained in the following voltage plot measured with the virtual oscilloscope:

MultiSim Interactive Voltage Plot of Device Circuit Simulation

The plot shows the voltage values of the circuit at the colour coded positions on the circuit schematic. The first 0.200s, the photodiode, we had not changed the current source value. This indicated no change in light emission. From 0.400s to 1s, we had increased the current to indicate/simulate a high degree of light energy being detected. From 1s to ~1.5s, we decreased the current to indicate the opposite, lowering light energy being detected. The results of this testing demonstrate that our circuit works as intended.

Proof of Concept and Printed Circuit Boards:


Finally, after a rigorous and in-depth design cycle, we proceeded to build the proof of concept product using a breadboard. This would allow us to hot swap different parts as needed and adjust the circuit according to real-time feedback. In lieu of constructing the whole circuit, we found it sufficient to prototype the amplification block of the circuit since this would describe how the device would perform during the actual use case.


We decided to test this circuit using light from an LED in order to ensure the circuit's functionality. The following plot was produced:

Voltage response of the device to light perturbation over time. The first initial down trend was the measured voltage response to increasing the light intensity of an LED; after which, we completely turned off the LED to which the voltage flattened. Subsequently, we completely turned on the LED to full power and saw the voltage response go to a constant drop. Lastly, we slowly decreased the power and saw a final upward trend.

After validating and demonstrating a proof of concept for our device, the last step in our design process was to produce a production copy of our device on a printed circuit board (PCB). This would allow for high-throughput manufacturing as well as a more compact fit within the hardware chassis that we’ve also designed. Using KiCAD, we generated the manufacturing gerber files, the files necessary to print this circuit board, and have ordered it. Unfortunately, due to the current supply chain issue, we have not yet had the opportunity to test the PCB.

Our Printed Circuit Board (PCB)

Physical Device:

A) Cardboard Prototype

To create a device with sufficient dimensions to house all the hardware, we set out to 3D print a hardware piece that would fulfill this goal. We began by fitting our circuit components in a simple cardboard model. This fitting is to test how we wanted the device to assemble so cardboard was used as a very cheap, accessible, and simple prototyping build. Any material can be used for this intermediary step. Two cardboard prototypes were made of 10 cm and 13 cm cubes. Smaller details such as ports were drawn in for positioning. As the circuit components fit snugly within the 10cm cube, the CAD design proceeded with those dimensions. This is to ensure that the final design is simple and portable for field use in the future. Our cardboard model allowed us to define the design specifications for our 3D printed model, therefore our next step of prototyping was to make the CAD design.


B) CAD Modelling

The device is designed in two core parts: the bottom box that houses a majority of the circuitry and the lid that must have the LEDs positioned opposite of the photodiode. The bottom box will consist of three pieces. The bottom piece houses the circuitry with brackets that support Arduino and breadboard placements. Arduino connections to a computer will go through the port at the base. The top piece will close off the bottom circuitry and hold the photodiode breadboard. A hole in the corner will allow for wiring to connect to the lid unlike the original concept that had wiring traverse the outside of the hinge. These two pieces will snugly fit together sandwiching the bottom part of the hinge. This separation is to ensure easier 3D printing of the device as our 3D printing method is fused deposition modeling (FDM) with PLA filament. The final part of the assembly is the lid that will house the LEDs through the small loop in the center. The top half of the hinge will be attached to the lid as one piece. These specifications, which were checked for size compatibility with our Arduino and breadboard parts, were designed with Autodesk 360 Fusion. Drawings designed on CAD are commonplace for 3D printing with third-party providers.

Our parts were printed in PLA plastic with 0.2 mm resolution and 20% infill, and the Arduino and breadboard prototypes were fitted into the 3D printed device. The optical bandpass filter has not yet been delivered at this time of writing so the prototype seen here will have an additional 10x10 mm 525 nm FWHM 50 nm filter covering the photodiode on the breadboard.



Conclusion:

With the circuit soldered together and fitted to our 3D printed prototype, we hope that the device can serve as a portable and low-cost plate reader for wheat diagnostics into the field at the site of heat impact. As high temperatures activate our promoter, the GFP (or other fluorescent proteins) in our construct are expressed to indicate gene activation. With this construct in transgenic wheat, the fluorometer device would then be used in the field to check wheat crops for heat stress without the need to bring plant samples into a lab. This provides independent confirmation to farmers with their transgenic wheat.

However, the current prototype is far from perfect. There is still a need for the addition of a soft material to block out ambient light as the device clamps over a leaf as per our original concept design. The current prototype also connects to a laptop for power and outputs data there as well. The final design will be battery-powered and provide confirmation of construct activity with LED status lights and displays. Additionally, the device would also incorporate readings from other sensors like that of temperature and humidity. The GFP device would provide on-site support for monitoring the transgenic, heat-tolerant wheat.

For further details such as the complete list of parts and costs, as well as full documentation of our entire workflow, please refer to our centralized GitHub repository.


[1] Ricardo Ramirez-Gonzalez, Philippa Borrill et al. The transcriptional landscape of hexaploid wheat across tissues and cultivars Science 2018: Vol. 361, Issue 6403, eaar6089. doi:10.1126/science.aar6089

[2] Rieger, T. R., Morimoto, R. I., & Hatzimanikatis, V. (2005). Mathematical modeling of the eukaryotic heat-shock response: dynamics of the hsp70 promoter. Biophysical journal, 88(3), 1646–1658. https://doi.org/10.1529/biophysj.104.055301

[3] Zhang, X. Y., Trame, M. N., Lesko, L. J., & Schmidt, S. (2015). Sobol Sensitivity Analysis: A Tool to Guide the Development and Evaluation of Systems Pharmacology Models. CPT: pharmacometrics & systems pharmacology, 4(2), 69–79. https://doi.org/10.1002/psp4.6

[4] Nath, A. Enzyme Kinetics & Simulations. https://depts.washington.edu/wmatkins/kinetics/

[5] Jablonsky, J., Bauwe, H. & Wolkenhauer, O. (2011). Modeling the Calvin-Benson cycle. BMC Syst Biol 5, 185. https://doi.org/10.1186/1752-0509-5-185

[6] Yang, S.F. & Hoffman, N.E. (1984). Ethylene biosynthesis and its regulation in higher plants. Annual Review of Plant Physiology, 35, 155-189. https://doi.org/10.1146/annurev.pp.35.060184.001103

[7] "Symbolism and Terminology in Enzyme Kinetics - Recommendation 1981", School of Physical and Chemical Sciences, Queen Mary University of London

[8] Parthiban V, Gromiha MM and Schomburg D: CUPSAT: prediction of protein stability upon point mutations.

[9] Parthiban V, Gromiha MM, Hoppe C and Schomburg D: Structural Analysis and Prediction of Protein Mutant Stability using Distance and Torsion Potentials: Role of Secondary Structure and Solvent Accessibility.

[10] Parthiban V, Gromiha MM, Abhinandan M and Schomburg D: Computational modeling of protein mutant stability: analysis and optimization of statistical potentials and structural features reveal insights into prediction model development.

[11] S. Arnott, P.J. Campbell-Smith & R. Chandrasekaran. In Handbook of Biochemistry and Molecular Biology, 3rd ed. Nucleic Acids--Volume II, G.P. Fasman, Ed. Cleveland: CRC Press, (1976). pp. 411-422.

[12] Manna, R. (2019). Towards development of an integrated system for diagnosis of infectious diseases of the blood. https://doi.org/10.13140/RG.2.2.34351.28325