Summary

Computational tools based on genome-scale metabolic models (GSMMs) have significantly advanced our capability to find non-obvious engineering targets for microbial cell factories. Most available software tools focus on predicting gene knockout effects and do not include the identification of putative gene amplification targets. Flux scanning based on enforced objective flux (FSEOF) is a prominent algorithm for identifying gene amplification targets and has been successfully used to optimize cell factories. However, FSEOF was not available as a stand-alone tool, and no code repository that applied FSEOF is publicly available. Thus, we developed a user-friendly command line tool that comprises FSEOF for any metabolite of interest in a GSMM. Our tool only requires three inputs from the user: An SBML file of a GSMM, the biomass-reaction ID of the GSMM, and the ID of the reaction that should be improved. The output is an Excel file with ranked reaction targets for overexpression. Thus, our user-friendly software identifies non-obvious genetic engineering targets for amplification that affect a metabolite of interest within a timeframe of one hour. The software is freely available at our team’s GitLab repository.

Introduction

The goal of metabolic engineering for industrial applications is the overproduction of metabolites with the help of microorganisms. Starting with single modifications in metabolic pathways, modern metabolic engineering approaches include a much more systematic view of biological systems. This is primarily fueled by advances in computational methods (Woolston et al., 2013). One of the most prominent computational methods in metabolic engineering is flux balance analysis (FBA) for the analysis of genome-scale metabolic models (GSMMs). GSMMs are mathematical representations of all known chemical reactions within an organism. FBA calculates the respective fluxes through all the reactions of a GSMM based on specific mathematical constraints (Orth et al., 2010). To learn more about the general background of FBA and GSMMs, take a look at our modeling page.

Many tools using FBA with GSMMs are available to find genetic targets for metabolic engineering. Most of them, however, focus on predicting gene knockout effects and do not include the identification of putative gene amplification targets. Flux scanning based on enforced objective flux (FSEOF) is a prominent algorithm for identifying gene amplification targets and has been successfully used to optimize cell factories (Choi et al., 2010; Park et al., 2012). As no stand-alone FSEOF software tool is currently available and, to the best of our knowledge, no public code repositories can be found online, we decided to develop a user-friendly command line tool that utilizes the FSEOF algorithm for the identification of genetic overexpression and downregulation targets.

Considerations

When we started using FBA to analyze our MonChassis yeast strains, we quickly realized that available tools and methods can be challenging to use without prior experience with FBA and GSMMs. Many tools are available in the COBRA toolbox for MATLAB. However, using the COBRA toolbox requires a MATLAB license and knowledge of the MATLAB programming language. Even though iGEM teams had free access to MATLAB for the duration of their project in recent years, many members of the iGEM community would benefit from free access to software tools for metabolic engineering beyond their iGEM project. Hence, we wanted our software to be as accessible and easy to use as possible, especially for users with only little experience with FBA and GSMMs. We aimed that our software can be used without programming knowledge and be fully built on the foundation of open-source libraries.

Implementation

Regarding the easy use of our software, we decided that for the basic usage of the FSEOF algorithm to identify metabolic engineering targets, only three inputs of the user are required: An SBML file of a GSMM, the biomass-reaction ID of the GSMM, and the ID of the reaction that should be improved. SMBL files for the most common chassis organisms can be easily downloaded from existing databases (e.g. BioModels). If the reaction of interest is an endogenous reaction of the organism, no further modification of the downloaded SBML file is needed. If the reaction of interest is part of a heterologous introduced pathway, these reactions need to be added to the SBML file. This can be easily achieved with various web-based applications for modifying SBML files, like ESCHER-FBA or fluxer. Users with programming experience are advised to edit their SBML files with the COBRApy library. The biomass ID and the reaction ID can be quickly found by searching for the terms "biomass" and a name of an involved metabolite of the reaction of interest within the SBML file. The FSEOF software returns an excel file that contains the ranked overexpression and downregulation targets.

To build our software on the foundation of open-source libraries, we decided to implement our FSEOF software in python using the COBRApy library (Ebrahim et al., 2013). COBRApy has most of the functionalities of the COBRA toolbox, but does not require a MATLAB license. Therefore, users with programming experience can easily adjust the software to their needs or implement the FSEOF algorithm in individual pipelines.

Installation & Usage

Our FSEOF software for finding genetic targets for overexpression and downregulation requires eight steps for the installation:

  1. Make sure you have python installed on your computer
  2. Download the FSEOF GitLab repository as a zip file
  3. Unzip the downloaded zip file
  4. Move the "wwu-muenster-main" folder to the desired destination on your computer
    Note: You can rename the folder if you like
  5. Paste the .xml file of your SBML model into the folder "wwu-muenster-main"
  6. Navigate to the folder in the terminal of your computer. Users without knowledge of navigation in the terminal have the following options:
    • Windows: Open the parent folder in the file explorer - Hold "Shift" and right-click on the "wwu-muenster-main" folder – select "Open PowerShell here"
    • MacOS: Open the parent folder in the finder – right-click on the "wwu-muenster-main" folder – select "New terminal at Folder"
  7. In the now opened terminal window type in: pip install -r requirements.txt And press enter. (Note: This is only necessary if you use the FSEOF software for the first time)
  8. Type in python run_FSEOF.py NameOfYourSBMLFile BiomassID reactionID and press enter. The results are stored in an Excel file
    • Example: python run_FSEOF.py yeast_gem.xml r_4041 r_4269

Additional options

Users can adjust the parameters of the FSEOF algorithm according to their needs with command line flags. However, this is not necessary for the basic functionality of the FSEOF software:

--steps


Default: 30


Adjust the number of steps that the FSEOF algorithm uses to gradually increase the enforced flux from its minimum value to its theoretical maximum.


Example: python run_FSEOF.py yeast_gem.xml r_4041 r_4269 --steps 40

--constrainBiomass


Default: False


Add a new constraint during FBA that constrains the flux through the biomass reaction of the GSMM at each iteration of the FSEOF algorithm. Constraining the flux through the biomass reaction can improve the accuracy of the results, as biological irrational solutions (e.g. no growth) are reduced. By default, the biomass flux is constrained to 95% of its maximal value. This value can be adjusted with the --changeBiomassConstrain flag.


Example: python run_FSEOF.py yeast_gem.xml r_4041 r_4269 --constrainBiomass

--changeBiomassConstrain


If the user decides to constrain the biomass, they can adjust the percentage of the maximal biomass flux they want to enforce.


Example: python run_FSEOF.py yeast_gem.xml r_4041 r_4269 --contrainBiomass --changeBiomassConstrain 0.80

--useFVA


Default: False


If this flag is used, flux variability analysis (FVA) instead of FBA is used to find the targets. This might improve the accuracy of the results, but will drastically increase the software's runtime. For more information on using FVA for the FSOF algorithm, users are referred to Park et al. 2012.


Example: python run_FSEOF.py yeast_gem.xml r_4041 r_4269 --useFVA

References

Choi, H.S. et al. (2010) ‘In Silico Identification of Gene Amplification Targets for Improvement of Lycopene Production’, Applied and Environmental Microbiology, 76(10), pp. 3097–3105. Available at: https://doi.org/10.1128/AEM.00115-10.

Ebrahim, A. et al. (2013) ‘COBRApy: COnstraints-Based Reconstruction and Analysis for Python’, BMC Systems Biology, 7(1), p. 74. Available at: https://doi.org/10.1186/1752-0509-7-74.

Orth, J.D., Thiele, I. and Palsson, B.Ø. (2010) ‘What is flux balance analysis?’, Nature Biotechnology, 28(3), pp. 245–248. Available at: https://doi.org/10.1038/nbt.1614.

Park, J.M. et al. (2012) ‘Flux variability scanning based on enforced objective flux for identifying gene amplification targets’, BMC Systems Biology, 6(1), p. 106. Available at: https://doi.org/10.1186/1752-0509-6-106.

Woolston, B.M., Edgar, S. and Stephanopoulos, G. (2013) ‘Metabolic Engineering: Past and Future’, Annual Review of Chemical and Biomolecular Engineering, 4(1), pp. 259–288. Available at: https://doi.org/10.1146/annurev-chembioeng-061312-103312.