Proposed Implementation

The main goal of our software toolkit is to embolden researchers to create fieldable synthetic biology constructs by strategically matching a chassis with a target environment. Toward that end we have developed a detailed plan for implementation described below. Our implementation plan entails three stages. (1) Making our software readily available to the widest community of users: We have already uploaded our code to GITHUB so that it is free to download and use by any researcher, educator, or student, and will also host it on a web server. (2) Broad dissemination of its purpose and utility: We will focus our dissemination on synthetic biology researchers, and plan to not only reach out to individual academics about the use of our software, but also synbio companies and organizations. We will aim to publish our models, code, and 16S results in a scholarly journal and present our research at various conferences and forums in order to circulate our project more widely. (3) Continued improvement based on community input and feedback: Importantly, we will continue our project indefinitely by continuing to improve our models and software, such as by including more raw data as it becomes available.

Phase 1: Making Our Software Readily Available



Our software will be accessible through either a prehosted interface with trained models, or a programmatic Python interface. We will host our software on a web server and continuously update it with modifications made by our team or contributed by others to the repository.

The interface will be hosted on a VPS service, so a web version will be accessible to those without a computer science background. We will continuously update this version with modifications made by our team or contributed by others to our open-source repository.

We have uploaded our software to the GitLab repository, which can be downloaded by users looking to self-host the web interface, use the programmatic Python interface, or to add their own data. Extended usage information is available in our README documentation, but these are the basic steps a user would go through to make use of the chassEASE repository:

Self-Hosted Web Interface

  1. Clone the repository using git clone https://gitlab.igem.org/2022/software-tools/william-and-mary.git
  2. Install the Python and Poetry build tools
  3. Run the web interface using ./interface.sh (Linux/MacOS) or ./interface.bat (Windows)

Programmatic Interface

  1. Clone the repository using git clone https://gitlab.igem.org/2022/software-tools/william-and-mary.git
  2. Install the Python and Poetry build tools
  3. Create an entrypoint Python file (entrypoint.py for example)
  4. Load an Environment and Subset

    import dataset_manager.environment
    import dataset_manager.subset
    
    # (A) Open the soil environment from a CSV file:
    environment = dataset_manager.environment.Environment("data/environments/soil.csv")
    
    # (B) OR, enumerate every environment chassEASE can find in its default search path ("data/environments")
    all_environments = dataset_manager.environment.get_all_environments()
    
    # Open the "species" subset of the open environment from the chassEASE search path ("data/subsets/soil/species.csv")
    subset = dataset_manager.subset.MetagenomicSubset(environment, "species")
    subset.setup()
                         
  5. Load or train relative abundance models

    import predictive_models.models
    from predictive_models.models.py_regression import PyRegressionHotModel
    from predictive_models.models.ann import ArtificialNeuralNetworkModel
    
    # (A) Open the hot-reload python regression from the chassEASE search path ("data/models/soil/species/py_regression")
    model = PyRegressionHotModel(environment)
    model.load("data/models/soil/species/py_regression", subset)
    
    # (B) OR, train a new ANN regression
    model = ArtificialNeuralNetworkModel(environment)
    model.train(subset)
    
    # You can also enumerate all relative abundance models available:
    for model_specification in predictive_models.models.relative_abundance_models:
        model = model_specification()
    
        print("We just created the", model.name(), "model!")
    
  6. Generate relative abundance predictions

    import pandas as pd
    
    # Create a demonstration sample using every median value from the subset
    sample_values = dict()
    for param in environment.subset_param_names():
        try:
            sample_values[param] = subset.get_param_median(param)
        except:
            # Handle the case that a parameter isn't found in our subset. This can be due to unsubstantial presence in the raw data source.
            sample_values[param] = 0
    
    # Convert the sample into a pandas Series, the format used by chassEASE
    sample = pd.Series(sample_values)
    
    # Infer the relative abundance of bacteria in this sample
    # (NOTE: Some models can return RA values less than 0 to indicate no presence in the sample)
    results: pd.Series = model.infer(sample, subset)
    results.sort_values()  #Sort the generated values by relative abundance
    
    print("Inferred relative abundance values:")
    print(results)
    
  7. Generate growth rate predictions using GEM models

    import predictive_models.fba_model
    
    # (A) Load a GEM model from a downloaded XML file.
    fba_model = predictive_models.fba_model.FbaModel("data/gems/iJN1463.xml")
    fba_model.load()
    
    # (B) OR, enumerate all GEMs available in the chassEASE default search path ("data/gems")
    gems_per_genus, gems_per_species = predictive_models.fba_model.load_gems()
    fba_model = gems_per_genus["Pseudomonas"]
    
    # Evaluate the FBA to get the predicted growth rate of the modeled bacteria in the conditions at "sample"
    growth_rate = fba_model.evaluate_fba(sample, subset.get_param_maxima())
    print("Predicted growth rate of Pseudomonas:", growth_rate)
    
  8. Run entrypoint with poetry

    Linux and MacOS:

    # Install and activate the poetry environment
    poetry install
    poetry activate
    
    # Ensure that python is aware of the chassEASE packages
    export PYTHONPATH=$PYTHONPATH:.
    
    # Run your entrypoint
    python YOUR_ENTRYPOINT_FILE.py
    

    Windows:

    :: Install and activate the poetry environment
    poetry install
    poetry activate
    
    :: Ensure that python is aware of the chassEASE packages
    set PYTHONPATH=%cd%
    
    :: Run your entrypoint
    python YOUR_ENTRYPOINT_FILE.py
    

Usage of additional programmatic features of chassEASE, including adding new environments, new raw data, new bacterial outputs, and new predictive models, is documented in the README.md file of the software repository.

Phase 2: Dissemination and Publishing



We will target our dissemination to synthetic biology researchers, but also will reach out to educators so that chassEASE can be used as an educational tool.

  • Dissemination to Synthetic Biology Researchers (including other iGEM teams):
    • We will reach out directly to synthetic biologists to explain its value to the field and to encourage them to use it as a tool to design fieldable constructs. Throughout our human practice interviews this year, we had synthetic biologists such as Dr. Bryn Adams express interest in our software and confirm that it would be useful to their research. With this insight, we will not only follow up with our IHP contacts who are synthetic biologists to let them know that our software has been posted, but also will conduct literature searches on fieldable synthetic biology with the aim of finding additional researchers who may find our software useful. We hope reaching out to these scientists one-on-one will foster communication about fieldability and inspire fieldable design.
    • We will write and submit our models, code, and 16S results as a journal article to synthetic biology journals in order to circulate it widely among the scientific community. We will also attend scientific conferences beyond the iGEM jamboree to present our research to as many attendees as possible.
    • We will reach out to future iGEM teams and encourage them to use our software with the aim of inspiring the design of fieldable constructs. We have already pitched our software to PuiChing Machau’s team as a way to make their hydroponics project in this year’s competition more fieldable.
    • We will reach out to synthetic biology and technology companies to see if they are interested in our software, as they have the resources to disseminate it more broadly than we can alone.
  • Dissemination to educators and students:
    • We will reach out to educators from local schools, particularly those we have partnered with for our educational projects this year, as well as educators here at our college to inform them about the utility of our software as an educational tool. We imagine our software can be used in introductory ecology and organismal biology lab courses to explain that bacterial survival is linked to the conditions of the environment it inhabits, in synthetic biology classrooms as students walk through the process of designing an engineered organism and learn about chassis selection, and in computer science or data science classrooms as an example of a data-driven computational approach to science.

Phase 3: Continued Improvement of chassEASE



This project is important to us, and so we plan on continuing our design-build-test process beyond this iGEM season. We have outlined several plans we already have to improve chassEASE, and also are excited to receive feedback about our software from researchers as we disseminate it in order to adjust our code to fit the needs of the community.

  • Improving Accuracy
    • There is a need for more well-characterized 16S data from around the world and in diverse environments. As more data becomes available, we will be able to incorporate them into our regressions and AI to make them more accurate.
      • Reaching out to 16S researchers
        • Too often, 16S sequencing is conducted on environmental samples with no associated metadata (environmental parameters of the collection site). In order to ensure we get a continuous stream of usable 16S data to update our software with, we will reach out to the 16S community and explain the importance of including environmental metadata when publishing 16S results.:
  • Including More Chasses
    • There is a need in synthetic biology for more available chasses that can survive in a huge range of extreme environmental conditions, and more accurately reflect the diverse makeup of microbial communities. Unfortunately, many bacteria naturally found in the environment can not survive in the lab, and therefore can not effectively be studied, and certainly can not be engineered. As technology improves and more environmental bacteria become characterized and culturable, we plan to add them into our software, thus expanding the possible outputs.
  • Including More Environments
    • Currently, our software assists researchers in selecting a chassis for soil, marine, air, and human gut microbiomes. We hope to expand this into other microbial biomes as well, such as the vaginal microbiome.
  • Adjusting Parameters
    • We would like to add more parameters, particularly to our human gut microbiome model, to account for more variability, such as from various diseases and disorders. Additionally, we recognize that one of our current parameters, Body Mass Index (BMI), is problematic as it is often used inaccurately as a metric of health and has thus resulted in medical discrimination (Ramos 2017). We incorporated it into our model as it was the only parameter relating to body size we could find 16S data for, and we wanted to make sure our software accounted for this diversity. However, if we can find data relating to body size using a less harmful metric, we will adjust this parameter accordingly.
  • Safety
    • Chassis selection is one roadblock to fieldbale synthetic biology, with safety being another crucial issue. We plan on linking all of our bacteria in our software to their BioSafety Level (BSL) in order to encourage researchers to think about safety while designing their construct. Our team takes safety very seriously, and recognizes the harm that fieldable synthetic biology can cause to the environment and communities when safety is not appropriately considered.
  • Optimal Conditions Analysis
    • Modify our software so an optimal conditions analysis can be performed. This would allow researchers to input specific chassis and get ideal environmental parameters for their preferred chassis.
  • Finally, in line with the cycle of engineering, we will implement feedback from the synthetic biology community as our software is used into future iterations.

Sources



Ramos Salas, X., Alberga, A. S., Cameron, E., Estey, L., Forhan, M., Kirk, S. F. L., ... & Sharma, A. M. (2017). Addressing weight bias and discrimination: moving beyond raising awareness to creating change. Obesity Reviews, 18(11), 1323-1335.