Modelling

Throughout our iGEM project, the team was keen to consider potential development avenues in silico, to allocate wet lab resources as efficiently as possible. Due to the nature of our system design, with respect to cell surface expression, we felt it important to understand how yield could be optimised following in-field deployment given this was something we wouldn’t be able to do in reality due to resource constraints faced by the team and legislative red tape within the United Kingdom.

When introducing a chosen protein of interest to bacterial chassis cells, the trade-offs faced are broadly understood within the field. We now know that unnatural gene expression can introduce a load onto an engineered system decreasing both the growth and expression of host proteins, an observed result referred to as “burden”. For the team to understand yield maximisation, it would be important to consider the trade-offs, and how they would affect our ability to express adequate amounts of cell-surface enzyme whilst maintaining growth. Given the nature of our cell surface expression, it would also be helpful to consider how both endogenous and exogenous proteins requiring free lipid for expression would compete.

Furthermore, given the nature of our deployment strategies, it would be important for the system to maintain some growth beyond deployment to allow for spraying inaccuracies. Initial findings confirmed that over-expression of our POI prevented noticeable growth from occurring, and in some cases stopped it altogether.

Introduction of Host Model:
For the development of our host dynamics, we based our core mathematics on a minimal mechanistic model presented by Darlington et al. [1] to effectively capture the microbial trade-offs with respect to bacterial gene expression. This host model aims to effectively capture three resource-limited processes as follows:,
(1) Finite energy levels, meaning the promotion of one energy dependent process will harm the rate of another.
(2) Finite levels of ribosomes capping the total rate of translation.
(3) Finite proteome, constant cell mass meaning the presence of one component will restrict the presence of others.

The host model looks to contain a set of core biological processes for the entirety of the proposed proteome in the form:

DNA -> mRNA -> Protein

Additional stages are required, especially with respect to cytoplasmic proteins that go on to express on the cell surface. It aims to be as simple as possible whilst maintaining the capture of processes that influence proteome composition and cell growth respectively.

The host model (no consideration for engineered additions) consists of 17 variables, all considering molecules per cell as their units. Whilst, other variables exist within the model space, only those structurally part of the cell, or internalised have been considered below.

• Ribosomal RNA, r
• Functional ribosomes, R
• Energy, e
• Internalised substrate, si

In addition to the above, the following exist for each type of protein, X ∈ {T, E, H, R}, representing respectively, transporter proteins, enzymes, host/housekeeping proteins, and ribosomes. These are distinguished between because of their type-specific roles within the cell and differing parameters used to influence their behaviour.

• mRNA, mx
• Translating complex, Cx
• Protein, Px

Note, due to the nature of our project considering cell surface expression(s) and lipid considerations, the “protein” element of T, pT, is later split into a cytoplasmic and membrane-expressed constituent, with a defined reversible process relating the two.

We look to capture a range of core cellular processes using a combination of the host variables above, and some external conditions using a system of predetermined ODEs (Appendix A)

Due to the importance of growth rate consideration within our system, the mathematics required to reach a value for lambda is demonstrated. The growth rate within this model describes the rate of division of individual cells into daughter cells and not the size increase of an individual cell. By defining the cell mass as total protein mass;

Chart

We can then obtain the system's ‘steady state’ growth by differentiating and setting to zero. A substitution can then be made for the differentials of Px and Cx respectively to return;

Chart

Where γ(E) is rate of protein elongation, as a function of ‘energy’ levels.

Lipid Consideration:
Due to the nature of our systems degradation strategy, it was important to introduce lipid considerations into the model as a fixed resource. Introducing membrane constraints has a noticeable impact on the proteome composition as the expression of one membrane protein potentially displaces another. Whilst the only two proteins that express on the cell surface are T and C respectively, both have substantial impacts on yield, as transporters allow internalisation of the external substrate, leading to increased energy levels, and C is the tool for target degradation. Whilst understanding of lipids and cell surface area is limited, certain experimental evidence suggests surface area can be expressed as a linear function of cell growth rate[2], meaning we can define surface using the following expressions, where lp1 and lp2 are both predetermined parameters respectively.

SurfaceArea= λ*lp1+lp2

total lipid=SurfaceArea*lp0

free lipid=totallipid-pC-pT

pC and pT are the two cell surface proteins which require lipid binding for expression. Alternatively, all proteins requiring lipid binding can be grouped pX, to simplify final equation. Whilst no value for lp0 has been found in experimentally in literature to give a final value for total lipid, the team ran various tests on the host model in isolation to obtain a value that we felt was sensible for lipid considerations. In order to do this, the team swept lp0 over a range of values for the host cell in isolation and concluded that from it would be fair to assume a cell may wish to maximise growth without needlessly over-synthesising lipid.

Chart
Figure 1. Growth rates given a range of lipid constraint. Lambda increases rapidly with lp0 up to ~580.

However, since we could not be sure that this value was entirely accurate, we must consider how a change in this value may change our design recommendations at later stages in the engineering cycle as our system is iterated by the wet lab team.

Implementation of Engineered Circuit:
To truly understand the dynamics behaviour of our system, we have tried to understand the protein behaviour, both with respect to its synthesis and binding lipid binding for surface expression. All component rates are subject to degradation and dilution with respect to growth, however, degradation is often to set to zero for the non-mRNA elements as they do not demonstrate sufficient instability.

We can consider the production of mRNA at a rate scaled by the cell's energy status.

Φ→mC

Ribosomes gives translation complexes, in a reversible reaction.

mC+pR ⇌cC

Translation complex produces a non-functional cytoplasmic protein in a non-reversible reaction.

cC→yC

Cytoplasmic protein binds to free lipid to produce cell surface functional enzyme in a reversible reaction.

yC+lp ⇌pC

In order to effectively capture a range of both growth, and subsequently degradation conditions, an additional input to the function allowed this to be changed where appropriate. This allowed relevant tests to be run over growth conditions appropriate to the environment the system was likely to find itself in at the time. The growth conditions found in the final model are as follows:
(1) Exponential growth model
(2) Batch culture (Monod growth)

Whilst, batch culture growth conditions were often used to reflect work performed in vitro, due to the nature of the finite external substrate present, exponential methods were useful for obtaining steady state system values, due to growth rates not tapering off as substrate is consumed.

In order for in silico predictions, and the model as a whole to be validated we have looked to line up in vitro growth curves with those generated by the model when considering batch culture as our environment. Firstly, we looked to overlay growth control (no POI expression) growth curves against our model host curves by turning circuit transcription off (wC = 0). As the magnitude of values differed, we had to standardise the data so that the curves would fit.

Chart
Figure 2. Model generated growth vs wet lab growth (Host in isolation).

Following this, wC was swept over a range to find a level of transcription which allowed in silico growth characteristics to be analogous to those seen in the wet lab environment. We found a rate of transcription within the model that allowed us to line up mathematically achieved growth rates with those seen in vitro. (wC = 35).

Chart
Figure 3. Model generated growth vs wet lab growth (Including POI expression).

Key Results:
Whilst a large number of the parameters contained within the model could not be changed in the lab, it was important to consider how they may affect our final design recommendation over reasonable ranges. In order to effectively optimise our system, it was important for the team to properly understand what our desired output would be. For the majority of the tests run below, this was decided to be degradation because of the remediation push that our project had behind it. Most of the tests were run with the “FLAG” input set to two, placing the system under batch culture environmental settings and meaning that the degradation was dictated using the function below, where xP is pesticide count pre-remediation.

Chart

Once a mechanic for degradation was established, it was important for the team to consider how a range of input parameters would affect this degradation time. Whilst variables like transcription rates seem like obvious starting points due to their ability for in vitro tunability, we would have to consider how a change in other parameters would impact our design recommendations; especially if we could not be certain of their exact value.When reviewing the literature for the degradation of our target pesticide using the methods used by the team, the binding affinities were not clear/appropriate for our application. As a result, we started by running multivariate sweeps and monitoring the impact on degradation time. Note, colour bar represents the time taken for 90% degradation of target pesticide in minutes.

Chart
Figure 4. Degradation efficacy for a range of wC and kC values - lp0 = 1e3.

For the plot generation with the tightest lipid constraint, we see a clear “wave” like behaviour presenting a window for which particular values of wC would allow for effective degradation over a larger range of possible binding affinities. This result would allow the team to design a more robust system that can deal with a greater range of potential degradation/binding environments, meaning performance can be maintained should any external factor influence the cell surface enzyme’s ability to break down the target.

Chart
Figure 5. Degradation efficacy for a range of wC and kC values - lp0 = 5e3.

As the lipid constraint is loosened, there is a clear improvement in system efficacy across the entirety of the plot surface. This is in line with our predictions, as an increase in lipid availability allows our protein of interest to co-exist at higher levels in parallel with the transporter protein, meaning cell surface enzyme expression can remain high without impacting the host's ability to internalise substrate, hence allowing growth rates to be maintained within the population.

Chart
Figure 6. Degradation efficacy for a range of wC and kC values - lp0 = 1e4.

Similar behaviour is further exaggerated when lipid constraints are expanded further, as expected due to the reasons mentioned above. When looking at identifying the behaviours displayed above, and why transcription is having such a large impact on degradation times, it is important to be able to visualise why the witnessed behaviours are occurring. In order to do this, the proteome composition of our engineered E.coli, can be plotted over a given time frame, with the aim of comparing and contrasting the presence of various constituent elements given a range of lipid constraints. The plots below demonstrate how the loosening of lipid constraint affects the two cell surface proteins, both with respect to their cytoplasmic and cell surface expression(s).

Chart
Figure 7. Membrane related protein(s) levels - lp0 = 1e3


Chart
Figure 8. Membrane related protein(s) levels - lp0 = 1e4

As suspected, an increase of lp0 results in the promotion of all 4 plotted elements. The limit of cell surface transporter protein (pT) is likely to be causing growth problems, as it is greatly reduced in the case where lp0 = 1e3. Comparatively, the presence of pT increases by roughly an order of magnate, as space becomes more readily available for the cytoplasmic T (yT) to insert into free lipid. Whilst impacts on growth due to substrate internalisation were blamed for the growth problems experienced by the system, the team was also keen to consider if the burden experienced because of the known molecular biology trade-offs were also playing a part. As mentioned previously, the expression of our POI alone would impact the cells’ ability to perform host processes due to various factors, including finite proteome availability and a fixed number of translating ribosomes.

To test this, growth rates were compared with:
(1) Full exogenous protein expression
(2) xOn set to zero so cytoplasmic POI doesn’t bind to lipid
(3) wC set to zero, turning expression of POI off entirely.

For full exogenous POI expression lambda fell to a value of 0.6506e-3. When cell surface binding was turned off, lambda rose significantly to 0.0147 demonstrating the impact of transporter protein displacement on colony growth rates. Furthermore, turning the circuit off entirely meant lambda rose again to 0.0259 showing the impact that our POI was having on the cell’s internal functions and processes from a typical synthetic biology ‘burden’ point of view. This allows us to make informed decisions about transcription rates depending on how firmly we believe that our lipid predictions are correct and whether we can find additional expertise to guide us further to a more accurate answer, be that in the form of literature or experts within the field.

Discussion:
Since a range of parameters within the model cannot be changed, it was important for us to make design recommendations across a potential range of both potential host and potential environmental conditions. As seen with the surf() plots, there is a distinct “wave” at which a certain transcription value gives the best possible degradation outcomes for a potential range of binding affinities between the target pesticide and expressed surface enzyme. As a result, going forwards into an additional engineering design cycle, the dry lab in silico team would be able to offer theoretical system changes to optimise degradation capabilities, whilst maintaining cell growth.

Our sweep of relevant parameters shows that optimal degradation performance is performed through reduced gene expression. At higher levels of expression with respect to our engineered gene of interest, the burden caused by the POI limits growth and, as a result, population. Conversely, at lower expression, the less burdened cells are free to grow rapidly, generating large populations capable of pesticide degradation with high efficacy. We have found this design rule holds over all presented enzyme-pesticide affinities, and for a range of different membrane constraints.

Methods:
Methods used by the team, from a mathematical point of view, are broadly described in Appendix A. To solve the system of ODEs proposed, MATLABs build in ODE solver ODE15s was used, due to its ability to generally handle stiff systems well. Furthermore, multivariate optimization was done using a range of parameter sweeping and optimization methods. A range of plots were generated at various stages, all of which were readily available within the basic MATLAB suite, including but not limited to, plot(), bar(), surf(), and mesh(). At some stages, the symbolic math toolbox add-on was used where appropriate, to visualise mathematics implemented and fill in for unknown variables where appropriate.

References:
1. A. Darlington, J.Kim, J. Jimemez, and D. Bates, "Dynamic allocation of orthogonal ribosomes facilitates uncoupling of co-expressed genes," Nat. Commun., vol. 9, e5415, 2018.
2. D. Serbanescu, N. Ojkic, and S. Banerjee, “Cellular resource allocation strategies for cell size and shape control in bacteria,” FEBS J., pp. 1-16, 2021.

Authors' Contributions:
Zak C. Fulk completed work with supervision from Alexander P.S. Darlington

Appendix A:
1. Please see full system of ODE’s derived below, as they were inserted into MATLAB for dynamic execution using built in ODE15s solver. This includes, only the dynamics for the host E.coli system.

Chart

Below is the additional mathematics designed to capture the additional insertion(s) of both host protein and additional engineered exogenous protein. This is due to host transporter protein needing to be split into a non-functional cytoplasmic protein (yT) and a functional cell-surface expressed protein (pT), following successful lipid binding.

Chart