Modeling

Summary

The goal of metabolic engineering for industrial applications is the overproduction of metabolites with the help of microorganisms. While the field originated from single modifications in metabolic pathways, today’s approaches to metabolic engineering include a much more systematic view of biological systems. Computational tools based on flux balance analysis (FBA) to analyze genome-scale metabolic models (GSMMs) have significantly advanced our capability to engineer microorganisms for industrial applications. We applied FBA to GSMMs for two purposes: First, we used FBA to find and evaluate suitable growth media for MonChassis’s engineered yeasts that maximize the flux towards monoterpenoid synthesis. We screened 125 and 53 carbon sources for the production of monoterpenoids with Saccharomyces cerevisiae and Yarrowia lipolytica, respectively. As in upscaling aerobic biotechnological processes the oxygen uptake rate is a key rate-limiting factor, we further ranked the top 20 carbon sources by their required oxygen uptake rate. Based on this, we decided on oleic acid as the most promising carbon source for the potential upscaling of the monoterpenoid production with MonChassis’s yeasts. However, wet lab testing of oleic acid as the sole carbon source did not show any α-pinene production. This was likely because of improper execution of the monoterpenoid extraction.

The second application of FBA to GSMMs was the identification of further genetic engineering targets for MonChassis’s yeast platform. To this regard, we applied flux scanning based on enforced objective flux (FSEOF) algorithm. We found that the cytosolic production of monoterpenoids in S. cerevisiae and Y. lipolytica requires increased cofactor supply, while a limited supply of fatty acids restrains peroxisomal production. We showed that changing the carbon source from glucose to oleic acid could likely remove the precursor limitations for the peroxisomal production of monoterpenoids. These results underlined the importance of growth media optimization.

Growth media optimization to scale-up the monoterpenoid production with yeasts

Summary

For efficient biotechnological fermentation processes it is crucial to find optimal growth media. Growth media and related procedural parameters like oxygen transfer are not only driving cost factors, but have great influence on metabolic pathway regulation, too. Therefore, we optimized our growth media with flux balance analysis (FBA), focusing specifically on the carbon source to scale up the production of monoterpenoids in MonChassis. Using FBA we developed a simulation algorithm, with which we generated a tailored ranking of growth media carbon sources for each of our proposed strains. Carbon source candidates were ranked based on their ability to facilitate α-pinene production. Moreover, to minimize the costly oxygen transfer required for aerobic growth, we calculated the oxygen demand for the top-ranking carbon sources from the previous analysis. Exploiting the power of dry lab simulation, we identified promising carbon sources like oleic acid and tested them with one of our strains in the wet lab.

Motivation

In the process of finding optimal reaction conditions for monoterpenoid production in MonChassis there are challenges regarding both the technical process engineering as well as the biological parameters. Within the latter, especially growth media are important to optimize, as they play a key role for the operation efficiency. First, the growth medium influences regulation of major metabolic pathways. Also, when considering oxygen as part of the growth medium, it determines whether the organisms can grow aerobically. Lastly, growth medium components are deciding cost factors. All these parameters get increasingly more relevant when the fermentation is scaled-up. As scale-up is an integral part of MonChassis, we aim to find optimal growth media for our proposed production strains.

Growth media are composed mainly of minerals, essential amino acids, metal ions, dissolved oxygen, and a carbon source. Especially important is the carbon source, as it is one of the most deciding factors for the metabolism next to oxygen availability. Over the past century, the most widely used carbon source glucose has proven to be ideal for the growth of many microorganisms. However, glucose is not ideal for every microorganism and every fermentation goal. It has been shown in the literature that alternative substrates can rival or even outperform glucose as the main carbon source in specific applications (Boonyanit et al., 2011; Farhi et al., 2011). Due to the similarity to our approach, an especially relevant case is presented in (Farhi et al., 2011), where the authors tested oleic acid as sole carbon source for peroxisomal terpene production in Yarrowia lipolytica. With oleic acid, product yields almost 20% higher than with glucose were reached. This underlines the importance to optimize growth media.

Methodology

When it comes to composing growth media, there are close to infinite substrate candidates and combination possibilities. Furthermore, different fermentation processes require different quantities of dissolved oxygen. Thus, the space of growth media compositions suited to grow microorganisms is vast. Exploring this space experimentally in the wet lab would require great financial and labor costs, as well as consuming much time, making this option infeasible for us. However, by in silico simulation of carbon source alternatives and their oxygen demands we narrowed down the screening space very efficiently. For all four of our strains, Saccharomyces cerevisiae cytosol and peroxisome and Y. lipolytica cytosol and peroxisome, we built a corresponding genome-scale metabolic model (GSMM). The two cytosolic pathway options for the main monoterpenoid precursor geranyl pyrophosphate (GPP) or neryl pyrophosphate (NPP) were not distinguished, because these isomers share the same place inside the pathway towards α-pinene and are thus treated equal by the model. GSMMs (in the literature also called GEMs) are mathematical representations of the metabolism of an organism, capturing the connection between genes, corresponding metabolites, and reactions (Nielsen, 2017). For the implementation we used existing metabolic models as scaffolds, (Lu et al., 2019) for S. cerevisiae and (Mishra et al., 2018) for Y. lipolytica, which – to the best of our knowledge – are the most comprehensive models of the respective organisms existing to this day. Into these models we integrated all genetic manipulations proposed in the metabolic engineering of our strains. These consist of gene knockouts, overexpressions and the introduction of heterologous genes. We implemented the models using the COBRApy Python library (Ebrahim et al., 2013). After integrating our heterologous genes, metabolites, and reactions, the two models for S. cerevisiae contained 4074 reactions, 2750 metabolites, and 1154 genes. Accordingly, the models for Y. lipolytica contained 1352 reactions, 1119 metabolites, and 660 genes. With the construction of these comprehensive models we aimed to represent the entire metabolism of the modeled organisms.

The method we used to analyze the models is flux balance analysis (FBA). FBA allows to analyze GSMMs by calculating the fluxes of all reactions that are part of the metabolic network. We used this method for the purpose of simulating the performance of different carbon sources in the growth media. The concept of FBA is as follows: First, the metabolic network of the considered organism is translated into a stoichiometric matrix S of n x m dimensions, containing n reactions and m metabolites. This forms the basic GSMM. This matrix is multiplied in a second step with a vector v representing the reaction fluxes v₁ – v_n. This vector contains the coefficients of interest that we want to determine. Finally, two integral constraints are added. The first one is the assumption that the metabolism is in complete steady state. This means that the fluxes of all metabolites are not zero, i.e.. the production of a metabolite must equal its consumption. Thereby, we obtain the linear equation S × v = 0, constituting a solvable linear equation system. Second, an objective function is added. As for matrices with n > m there is more than one solution possible, to find an optimal solution we must add an objective function as the last constraint. This function is to be defined as the goal of the optimization problem. It is simply the reaction whose flux we aim to maximize. The solution can then be found by linear programming (Orth et al., 2010), the solver we used was glpk. This concept is depicted in figure 1.

**Figure 1: Concept of flux balance analysis (FBA).** **(1)** Starting from a metabolic network, *i.e.*. a number of chemical equations, **(2)** these equations are translated into a stoichiometric matrix S. This matrix is also called genome-scale metabolic model (GSMM). With the integral constraint of FBA that the metabolism is in steady state, the product of S and flux vector v is set to zero. **(3)** This turns FBA into a system of linear equations that can later be solved by linear programming. (Figure adapted from Orth et al., 2010)

With the formulated solvable optimization problem of maximizing the α-pinene production flux, growth rates and the theoretical maxima of this flux for changing conditions can be calculated with very little computational power. Such conditions can be different media compositions and oxygen availability, allowing us to compare the performance of different sets of parameters regarding the defined objective.

To find optimal carbon sources for the growth media of our production strains, we ranked all considered carbon source candidates regarding the maximal α-pinene production flux that they facilitated. We tested 125 different carbon sources for both S. cerevisiae strains and 53 for both Y. lipolytica strains. The top 20 carbon sources of each modeled strain were then used for oxygen demand analysis. In this second step we calculated the oxygen demands of the respective strain with each different carbon source. For that we gradually lowered the amount of oxygen available to the metabolism towards zero and calculated the maximal possible α-pinene production flux at every grade. This additional analysis was motivated by the fact that oxygen transfer rates are increasingly difficult to maintain with increasing reactor volumes. As scale-up is an integral part in MonChassis, oxygen demand is an important factor to consider for us.

**Figure 2: Maximal α-pinene flux of the 20 top-ranking carbon sources and their oxygen demands.** For all four modeled strains, from A - D, carbon sources were ranked by their ability to facilitate α-pinene production. Based on this ranking, the maximal α-pinene production flux of the top carbon sources was calculated for decreasing oxygen availability.

The resulting rankings from the first step of this analysis and the oxygen demand analysis results can be seen in figure 2. For 100 mmol/gdcw/h oxygen available, the α-pinene flux of all considered carbon sources is at the respective maximum. Additionally, the oxygen demand of all carbon source candidates is visualized. With decreasing oxygen availability, the α-pinene flux generally decreases as well, but at different available concentrations for different carbon sources and with different slopes. This can be used to estimate the oxygen demand of a carbon source.

The ranking shows that oleic acid facilitates the highest overall α-pinene flux with over 8 mmol/gdcw/h for both S. cerevisiae strains and around 5 mmol/gdcw/h and ranks second for Y. lipolytica cytosol and fifth for Y. lipolytica peroxisome. Note that due to different complexities of the underlying GSMMs of the two species, the absolute flux numbers are not directly comparable to each other. Another high-ranking carbon source for all strains is stearate, also a C-18 fatty acid.

Wet lab testing of oleic acid

Following the simulative instance presented above, we translated the results into a wet lab experiment. To decide on the carbon source to test we oriented on the calculated selection. However, this decision was made only after additional research about the properties of the candidates that were not regarded up until this point. These include practical necessities like solubility, toxicity and finally the price. Oleic acid, one of the best performing carbon sources regarding maximal α-pinene production flux, was selected as exemplary candidate. Besides the predicted performance, this was also due to prior work evincing its special aptitude for peroxisomal terpene production in Y. lipolytica (Farhi et al., 2011). It has been shown that oleic acid predominantly triggers the fatty acid metabolism in potent organisms. Therefore, it has been hypothesized to favor the production of metabolites that branch off from the corresponding pathways (Farhi et al., 2011). Lastly, oleic acid is contained in some waste products like frying fat. This further increases its potential as alternative or partial supplement for glucose as it could reduce operation costs significantly while potentially outperforming it at the same time.

With this approach we aimed to experimentally measure the actual α-pinene concentrations produced by our yeast strains growing with oleic acid instead of glucose. Thereby, we implemented the final step to optimize the growth media for our strains. Moreover, with the results of this analysis we aimed to validate the first, simulative instance of our approach. This is especially important as only after the translation from dry to wet lab we can make use of this analysis in MonChassis.

For the implementation of this experiment, we chose three different mixtures with varying compound ratios. The first mixture (1) contained 100% SD-Medium but with glucose being completely replaced by oleic acid. Oleic acid was used in the same concentration as glucose before (20 g/l). To ensure better solubility of the fatty acid, we added 0.1% (v/v) Tween 80. The second mixture (2) contained the exact same but with 1% (v/v) Tween 80. This was done to check which concentration of it would solubilize better. The third mixture (3) contained only 50% SD-Medium without glucose and again 20 g/l oleic acid. In this reaction we aimed to test whether oleic acid could also partially replace other expensive components of the growth media, like amino acids. Lastly, we added a control reaction with 100% regular SD-Medium (including glucose). All four reaction mixtures were inoculated in triplicates with the same preculture of strain S. cerevisiae_MS_8 . We incubated the 12 reaction mixtures at standard conditions for 48 hours. After incubation, the exact optical densities at 600 nm were not measured because of the distorting solubilization effect of Tween 80. Instead, we counted diluted cells. From each grown culture we directly extracted 1 ml , diluted 1:10⁶ and 1:2*10⁶ and spread out the dilution on SD-His-Leu-Trp agar plates, respectively. The calculated cell numbers showed that the cultures from reactions (1) and (2) grew to only about a fifth (or 18%) of the cell density of the control cultures. Culture (3) grew to half of it.

In the last step we extracted the cell ingredients with ethyl acetate and performed GC-MS measurements, following the extraction protocol.

Unfortunately, we were not able to measure α-pinene in any of the twelve reactions mixtures. We figured the most likely reason for this to be the improper execution of the extraction process. This is because even in the control reaction we could not detect any product, although we would expect such to be present in quantities similar to those of the measurements in the cytosolic or peroxisomal production strains. We presume problems occured within the extraction of α-pinene, either within the cell disruption with the Retsch, or n the transfer of the supernatant into the measurement vials after cell lysis. These problems may likely have persisted even in the conduct of a second run of the experiment, in which we used multiple small glass beads for cell disruption with Retch just as in our final version of the extraction protocol. In the first run we had used a single metal bead for the same purpose instead. The calculated cell numbers indicate that the absence of glucose results in the reduced ability of S. cerevisiae to grow effectively. However, the sole cell count has no explanatory power to indicate any reduced ability to grow efficiently, which in our terms considered the ability to produce α-pinene, the independence on costly glucose and the demand for oxygen. Therefore, we cannot assess the aptitude of oleic acid as carbon source alternative to glucose for producing α-pinene.

Discussion

The chosen approach has the major advantage that costly and time-consuming wet lab experiments can be reduced to a minimum. As difficult-to-know enzyme kinetics are not required for the analysis of GSMMs with FBA, prior acquisition of experimental data could be omitted. Due to this simplicity and the power of linear programming solvers, we were able to perform the simulative instance of this approach within seconds on a normal laptop. This allowed us to explore the space of possible carbon sources much faster and with much less experimental effort, saving both time and resources. Only the final testing in the wet lab was necessary to validate the prior prediction.

However, the underlying mathematical concept of FBA has clear limits. While abstracting from kinetic parameters sped up the implementation significantly as touched on above, at the same time it alleviated the accuracy of the prediction in terms of exact product concentrations. Also, by using FBA, any physiochemical interactions between metabolites and/or non-metabolite structures outside of known stoichiometric reactions were not regarded. Lastly, properties like solubility, stability, and toxicity of compounds were dismissed. The lack of toxicity consideration for instance resulted in the analysis ranking ethanol in positions higher than we would expect from a biological standpoint of view. Therefore, applying FBA to GSMMs for the considered purpose of media optimization can give us a profound first suggestion, but we must further validate the computed predictions.

Conclusion

We conclude that by using in silico modeling it is possible to gain valuable insights into the properties and performances of growth media compounds. The identified carbon sources like oleic acid, stearate and other long-chain fatty acids could be promising carbon source alternatives to glucose for the production of monoterpenoids. When scaling up the fermentation process, we could additionally benefit from the oxygen demand analysis that we performed. Even if the maximal α-pinene production of carbon sources like pyridoxine starts at a significantly lower level than those of the top-ranking fatty acids, it could potentially be the more economical choice for scale-up when oxygen levels start to drop naturally. The actual aptitude of most of the predicted carbon sources is yet to be tested, though. The wet lab verification should therefore be repeated in the future.

Finding new genetic engineering targest with flux scanning based on enforced objective flux

In addition to optimizing the growth media for α-pinene production, we further aimed to improve MonChassis's yeast strains for monoterpenoid production by identifying other engineering targets. As genome-scale metabolic models (GSMMs) and flux-balance analysis (FBA) do not incorporate any kinetic or regulatory information, it is unlikely to find novel genetic engineering targets in the mevalonate pathway by applying these methods. The influence of kinetic and regulatory bottlenecks that can be identified by these methods were already decreased according to our initial strategy including the overexpression of rate-limiting enzymes and the expression of the truncated variant of the HMG-CoA reductase 1 (Bröker et al., 2018). However, reactions apart from the mevalonate pathway can also be successful targets for genetic engineering as modifications could e.g. increase the substrate pool and cofactor supply or reduce competing pathways.

Many tools are available to identify genetic identification targets (Gu et al., 2019). Most tools, though, are only suitable for predicting the effect of knockouts, as the phenotype of knockouts is easier to predict (Choi et al., 2010). A prominent algorithm to identify overexpression and knockout targets is flux scanning based on enforced objective flux (FSEOF) (Choi et al., 2010). As the growth media optimization, FSEOF relies on constraint-based FBA of GSMMs and, thus, works with the same assumptions and constraints as explained before.

FSEOF was performed as described before (Choi et al., 2010; Park et al., 2012) with the same GSMMs that we used for the growth media optimization. The first step of the FSEOF algorithm is the calculation of the initial fluxes through the biomass objective function v_Biomass and the reaction of interest v_{Target_initial}, which in this case was the formation of α-pinene from GPP. Additionally, we calculated the theoretical maximal flux towards α-pinene v_{Target_max} within the respective metabolic context of the four strains we designed as MonChassis’s yeast platform (note that we again do not treat GPP and NPP utilizing yeast strains as different strains). In this regard, we performed FBA of the respective GSMMs while setting the objective function to our reaction of interest. In the second step of the FSEOF algorithm, we optimized v_Biomass while adding a new constrain for FBA, which gradually enforced an increasing flux through our reaction of interest from v_{Target_initial} to near its theoretical maximum v_{Target_max}: v_{Enforced_product_flux} = v_{Target_initial} + k/n (v_{Target_max} – v_{Target_initial}) with k = {1,2,3, … , n-1} and n ≥ 10. For each iteration, the fluxes for all reactions of the GSMM were stored. Subsequently, reaction targets for overexpression and downregulation were identified by linear regression analysis of the reaction’s flux towards the gradually enforced flux through the reaction of interest. The calculated slope of the linear equation for each reaction was used to rank reaction targets for manipulations. Strong positive correlations of a reaction flux and the enforced flux through the reaction of interest indicate that overexpression of the encoding genes is necessary to achieve near-maximal product flux. In contrast, a strong negative correlation can be utilized to identify potential downregulation targets. Exchange reactions that are not modifiable by genetic engineering were removed prior to the analysis. Other iGEM teams who would like to use the FSEOF algorithm can use our user-friendly command line tool to quickly identify targets for genetic manipulation.

We identified similar targets for overexpression and downregulation, depending on the compartment but independent from the yeast species (Fig. M2). To produce α-pinene in the cytosol of S. cerevisiae and Y. lipolytica, most of the highest-ranked overexpression targets produced cofactors that can be consumed within the endogenous mevalonate pathway. For the peroxisomal production of α-pinene, metabolic pathways for the synthesis of malonyl-CoA were overexpression targets in the metabolic background of S. cerevisiae and Y. lipolytica . Malonyl-CoA is an essential substrate for the biosynthesis of fatty acids from sugars. The fatty acids are subsequently degraded to acetyl-CoA in the peroxisomes, where acetyl-CoA is converted to IPP/DMAPP by the heterologous (and endogenous) mevalonate pathway. The recombinant expression of NPP/GPP synthase and α-pinene synthase can catalyze the conversion of IPP/DMAPP to α-pinene. Therefore, precursor supply appears to be the limiting step for the peroxisomal production of monoterpenoids. However, it is important to note that the FSEOF algorithm was applied here with glucose as the sole carbon source. Changing the carbon source from glucose to lipids is, therefore, likely to alter potential overexpression targets as an increased fatty-acid biosynthesis becomes less essential.

**Figure M2: Overview of overexpression targets for the four different modeled yeast strains of MonChassis.** The first ten reactions, ranked the most influential targets for genetic engineering, are displayed together with the mevalonate pathway. Green arrows visualize reactions for overexpression. Exchange reactions that cannot be modified by genetic engineering were removed prior to the analysis.

Even though the FSEOF algorithm was primarily developed to find overexpression targets, it can also identify targets for downregulation (Choi et al., 2010). Highly similar downregulation targets were identified for all four GSMMs subjected to the FSEOF algorithm. An overview of the top five overexpression and downregulation targets for our four yeast models can be found below. The highest-ranked downregulation targets catalyze reactions of the Krebs cycle and consumption reactions of cofactors. It is, again, important to note that FBA-based methods do not necessarily provide biologically meaningful results in the sense that enforcing an increased flux through a reaction of interest can eliminate growth. Downregulation of the Krebs cycle is likely to be detrimental to the growth of yeasts. Thus, new downregulation targets should be identified using more sophisticated tools. Additionally, the following steps of finding further engineering targets should focus on the cofactor supply of the mevalonate pathway. For this, specialized tools, like cofactor modification analysis (Lakshmanan et al., 2013) can be used.

The results presented here already provide valuable insights into the limiting steps within our respective designed pathways. Further, by modifying the identified genetic targets, we overcome these limitations and even further improve MonChassis’s yeast strains to make large-scale monoterpenoid production finally possible.

Overview on the identified genetic engineering targets

Summary

Growth media optimization to scale-up the monoterpenoid production with yeasts

Summary

Motivation

Methodology

Wet lab testing of oleic acid

Discussion

Conclusion

Finding new genetic engineering targest with flux scanning based on enforced objective flux

References