Model

Models are powerful tools in terms of imagining reasonable future scenarios. Also, assumptions and results from the model push us to reflect on the present situation and make adjustments to our existing plans and experiments, which might also develop our final goal into a more realistic version.

There are 3 modeling sections in our project:

  • Visualization
    This section mainly shows the structure of proteins and their docking results, and also tries to provide guides for future experiment design.

  • Description and Prediction of drug efficacy
    This section mainly describes the efficiency of protein expression, and tries to predict the scenario after a patient takes the drug.

  • Hardware
    This section shows the 3D model of the micro-capsule where yeast cells are sealed in. The micro-capsule is made of shellac, and the pH-dependent solubility that prefers a mild basic environment makes shellac perfect for releasing probiotics in the intestines.

Visualization

Overview:

We used online softwares to help us visualize the 3D protein structures and modeled their docking scenarios, and then we evaluated their stability by using some common parameters. This section aims to visually display the protein structure synthesized in this project and provide ideas or directions for future experiments or experimental teams

Part 1: General pattern

Our team designed DNA sequences for the fusion protein-nanobody complex ourselves and these newly designed protein's 3D structures are unknown. A protein's 3D structure determines its functionality, which, in our case, would be the nanobody's docking on the membrane and its binding to the antigen.

Protein structure prediction

Based on the amino acid sequence of the protein, various algorithms can predict the protein's 3D structure. These algorithms use machine learning to learn from databases of known amino acid sequences and structures. It would intuitively indicate whether the designs are feasible by checking their structures.

  • Prediction service

We use Robetta and AlphaFold to predict the structure of the proteins we designed.

Robetta1 is an online public service to predict protein structure. It is based on the algorithms of Rosetta from the Rosetta Commons.

Robetta is a protein structure prediction server developed by the Baker lab at the University of Washington. At it's core is the Rosetta macromolecular modeling suite developed by the Rosetta Commons, a multi-institutional collaborative research and software development group. Robetta's primary service is to predict the 3-dimensional structure of a protein given the amino acid sequence.

AlphaFold2 is a prediction software that can structure the 3D model of a protein based on its amino acid sequence, using a combination of bioinformatics and physical approaches. We run models on AlphaFold (Phenix version) 3 Colab notebook and it will automatically give the highest-scoring model. This Colab notebook is derived from ColabFold4 and the DeepMind AlphaFold2 Colab5

Robetta1 and AlphaFold Colab notebook3 both accept single-letter code amino acid sequences and generate Protein Database (PDB) files, which contain information about the protein's structure. Robetta produces 5 models of a different configuration with the same score for each sequence. Alphafold automatically produces the model with the highest score.

We view these PDB files in PyMOL6, a powerful tool for 3D structure visualization.

Protein structure evaluation

Assessment of these resultant structures from Robetta and Alphafold2 is needed for deciding which structure is more credible and can be used in future analysis. There are several parameters to consider: Ramachandran plot, overall G-factors, atomic Z-score RMS, percentage of the amino acids having scored >= 0.2 in the 3D/1D profile, overall quality factor, and Z-score. We use the online server SAVES v6.07 and ProSA8 for the validation.

  • Parameters
  1. Ramachandran plot:
    Ramachandran plot shows the theoretical conformation of amino acid residues and is mainly used to evaluate the model quality after homologous modeling. It considers whether the conformation of amino acids is reasonable. Based on an analysis of 118 structures of resolution of at least 2.0 Angstroms and R-factor no greater than 20%, a good quality model would be expected to have over 90% in the most favored regions.7
  2. Atomic Z-score RMS:
    Z-score root means square deviation (Z-score RMS) measures the "average magnitude of the volume irregularities in the structure."9 Z-score RMS for a good model should be around 1.0.
  3. Percentage of the amino acids having scored >= 0.2 in the 3D/1D profile:
    Indicate whether the atomic model is compatible with its amino acid sequence. It should be higher than 80% for a good model.7
  4. Overall quality factor:
    An overall score for the model provided by SAVES server7, ranging from 0 to 100. It should be higher than 80 for a good model.10
  5. Z-score:
    Indicate whether the z-score of the input structure is within the range of scores typically found for native proteins of similar size8.

All the structures shown later are the highest-scoring structures.

Protein structure alignment

After the evaluation of models we got from Robetta1 and AlphaFold Colab notebook3, we can compare the antibody structure in the model with its original structure by using PyMOL6 alignment. If RMSD (root mean square deviation) value is very low, we regard the two structures as similar, which means the nanobody in our fusion-nanobody complex will theoretically function well.

Part 2: Yeast-20ipaD visualization

We have successfully expressed the fusion protein-nanobody complex, which is yeast-20ipaD. Here is the highest-scoring structure of yeast-20ipaD (Figure 1).

Figure 1 | 3D structure of yeast-20ipaD complex

Ramachandran plot of the above structure showed that 85.1% of residues are in the most favorite regions, 11.9% are in the allowed regions; 1.9% of the residues are in the generously allowed regions and 1.1% in the disallowed regions. The Atomic-Z-score RMS is tested to be 1.536, which is a little bit out of range. A very good model would have a value close to 1. The percentage of the amino acids that have scored >= 0.2 in the 3D/1D profile is 92.39%. The overall quality factor is 82.89, which is acceptable. ProSA calculated Z-score is -8.7, which falls in the range of scores typically found for experimentally determined (X-ray, NMR) structures for native proteins of similar size in the PDB database. The calculated average energy over 40 residues is always under 0. Overall, the above structure can be regarded as a good model of yeast-20ipaD.

Figure 2 | Alignment of 20ipaD between itself in complex and its original structure.

The alignment between 20ipaD in the complex and 20ipaD's original structure shows that our complex can have a good effect theoretically (Figure 2). The RMSD value is tested to be 0.807, which means the complex has little influence on the structure of 20ipaD.

Part 3: Docking modeling

Interaction information (binding sites) of 20ipaD and IpaD is found in the paper: Single-Domain Antibodies Pinpoint Potential Targets within Shigella Invasion Plasmid Antigen D of the Needle Tip Complex for Inhibition of Type III Secretion11. Since the alignment shows that the 20ipaD in the complex is almost the same as its original structure, we can default the binding sites unchanged. The following figure contains the binding sites between 20ipaD and IpaD (Figure 3).

Figure 3 | Binding sites between 20ipaD and IpaD

We provided the interaction information for the online server HADDOCK^HADDOCK] to simulate the docking between the complex and IpaD. Below is the docking structure (Figure 4).

Figure 4 | Docking between 20ipaD (in the complex) and IpaD

Ramachandran plot of the above structure showed that 85.0% of residues are in the most favorite regions, 12.7% are in the allowed regions; 1.2% of the residues are in the generously allowed regions and 1.0% in the disallowed regions. The Atomic-Z-score RMS is tested to be 1.441, which is relatively close to 1. The percentage of the amino acids that have scored >= 0.2 in the 3D/1D profile is 87.61%. The overall quality factor is 85.53, which is acceptable. ProSA calculated Z-score is -7.16, which falls in the range of scores typically found for experimentally determined (X-ray, NMR) structures for native proteins of similar size in PDB database. The calculated average energy over 40 residues is always under 0. Overall, the above structure indicates that 20ipaD can bind with IpaD well theoretically.

Below is the alignment between our docking result and the original docking between 20ipaD and IpaD. RMSD is 0.36, which is very small and acceptable.

Figure 5 | Alignment between yeast-20ipaD-IpaD and 20ipaD-IpaD.

All the above shows that our yeast-20ipaD complex could theoretically work well when binding with IpaD.

Part 4: Other combinations of probiotic-nanobody complex

In this section, we would like to show other possible combinations of fusion protein-nanobody complex that might theoretically function well. To do this, we modeled their 3D structures and their docking scenarios and analyzed them. This could be helpful for future experiments and future teams.

Table 1 | Evaluation of other combinations of probiotic-nanobody complex

The E. coli Intimin protein complex has good results in 4 out of 5 quantitative parameters, making it a great model. First, its Ramachandran plot reveals a good distribution in favored regions. 83% of its residues are in the core regions, 14.7% are in the allowed regions, 1.3% are in the general regions and 0.9% in the disallowed regions. Second, the % amino acids having scored 0.2 in 3D/1D profile is 85.91%, higher than the standard 80%. Its atomic model is therefore fairly compatible with its amino acid sequence. Third, its overall quality factor, which accounts for the error value of individual residue, is at 92, surpassing the 80 score cut-off. Its Z-score is also within the range of scores found for native proteins of similar size. The main downside to this model is that Z-score is not around one,showing that it has a greater average magnitude of the volume irregularities in the structure.

The E. coli CsgA protein complex is a good model that fulfills 4 out of 5 quantitative parameters. Like Intimin, CsgA's Ramachandran plot displays a good distribution in favored regions: 83.5% of residues are in the core regions, 12.7% are in the allowed regions, 2.0% are in the general regions and 1.8% in the disallowed regions. The % amino acids having scored 0.2 in 3D/1D profile is 84.27%, higher than the standard 80% but not nearly as high as Intimin. Its overall quality factor is 91, exceeding the 80 score cut-off. The Z-score is within the range of scores found for native proteins of similar size, and like Intimin, the CsgA model's Z-score is not around one.

The Lacto tag 1 protein complex is a good model that fulfills 4 out of 5 quantitative parameters. CsgA's Ramachandran plot displays a well enough distribution in favored regions: 83.1% of residues are in the core regions, 15.2% are in the allowed regions, 1.5% are in the general regions and 0.2% in the disallowed regions. The % of amino acids having scored 0.2 in 3D/1D profile is 79.97%, close enough to the standard 80% but comparatively low against Intimin and CsgA. Its overall quality factor is 92, surpassing the 80 score cut-off. The Z-score is within the range of scores found for native proteins of similar size, and similar to Intimin and CsgA, the CsgA model's Z-score is not around one.

Overall, all three protein docking jobs show satisfactory results and could be helpful for future experiments and future teams. In light of these promising findings, we encourage future researchers to test these models in the lab, with the E. coli Intimin model that performed the best among the three.

Summary

To conclude, we have developed the best 3D model for yeast-20ipaD and the docking structure between yeast-20ipaD and IpaD. They are well analyzed and are decided to have a good effect theoretically. Moreover, we modeled other 3 possible combinations of the fusion protein and nanobody, which might serve as guidance for future experiment design.

3

AlphaFold(Phenix Version) <https://colab.research.google.com/github/phenix-project/Colabs/blob/main/alphaf old2/AlphaFold2.ipynb>

9

Pontius, J., Richelle, J., & Wodak, S. J. (1996). Deviations from standard atomic volumes as a quality measure for protein crystal structures. Journal of molecular biology, 264(1), 121–136. https://doi.org/10.1006/jmbi.1996.0628Deviations

11

Barta, M. L., Shearer, J. P., Arizmendi, O., Tremblay, J. M., Mehzabeen, N., Zheng, Q., Battaile, K. P., Lovell, S., Tzipori, S., Picking, W. D., Shoemaker, C. B., & Picking, W. L. (2017). Single-domain antibodies pinpoint potential targets within Shigella invasion plasmid antigen D of the needle tip complex for inhibition of type III secretion. The Journal of biological chemistry, 292(40), 16677–16687. https://doi.org/10.1074/jbc.M117.802231

10

Messaoudi A, Belguith H, Ben Hamida J.(2013). Homology modeling and virtual screening approaches to identify potent inhibitors of VEB-1 β-lactamase. Theor Biol Med Model. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668210/

Description and Prediction of drug efficacy

Overview

Wet lab results showed that we have successfully expressed 20ipaD on the surface of the yeast. Our goal is to describe the efficiency of expression and predict the scenario after the drug is released in the patient's jejunal.

Part 1: Protein expression efficiency

To describe how efficient is yeast concerning the expression of 20ipaD, we manually calculated the efficiency by counting the number of yeast cells that expressed 20ipaD and the total number of yeast cells (Figure 1). Through separate assessments conducted by two individuals, we found that our model has a 31.25% expression rate with an STD of ± 0.75.

Figure 1 | Fluorescent cells for yeast cells that obviously expressed 20ipaD. Dark cells for no or very little expression

Part 2: Prediction of post-medication scenario

In this section, we want to predict post-medication scenarios by using math modeling. In detail, we want to see how will the number of floating Shigella change after the drug release, this will be depending on natural growth of Shigella, the binding efficiency of 20ipaD and IpaD, the concentration of yeast cells, and the survival rate of yeast cells.

yeast cells are sealed in micro-capsules made of shellac before being delivered into bodies. The pH-dependent solubility that prefers a mild basic environment makes shellac perfect for releasing probiotics in jejunal.

There are several important assumptions for this model:

  • Each floating yeast cell carries one 20ipaD;
  • Each floating Shigella cell carries one IpaD;
  • Growth of Shigella cultured in BHI (Brian Heart Infusion) medium is similar to the growth of Shigella in human's jejunal;
  • Each 20ipaD binds with one IpaD once;
  • Died yeast cell cannot display 20ipaD and cannot bind with IpaD;
  • yeast growth is not considered here since a healthy immune system does not allow overgrowth of the yeast, and the growth of the yeast is very slow in neutral; or even slightly alkaline conditions12.

Based on these assumptions, the following two equations describe the change in the number of Shigella and yeast:

: the log count of the number of Shigella in jejunal at time after drug release;
: the log count of the number of Shigella in jejunal at time before drug release;
: the binding efficiency between 20ipaD and IpaD;
: the log count of the number of yeast in jejunal at time after drug release;
:survival rate of yeast in jejunal

Detail:
The natural growth of Shigella in jejunal can be described in the form of Gompertz equation13:

: the log count of the number of Shigella in jejunal at time t before drug release;
: the asymptotic log count as decreases indefinitely;
: the asymptotic amount of growth (log number) that occurs as increases indefinitely;
: the time at which the absolute growth rate is maximum;
: the relative growth rate at .

Parameters , , ,and depend on the environmental conditions: temperature , , and sodium chloride concentration .
In our case, is defaulted to degrees; the is about in jejunal and about in ileal14; in jejunal is about 15.

Translate the concentration into the form of a percentage:

So, the concentration of sodium chloride is about in jejunal.

Gompertz equation parameters and calculated growth curve values for Shigella can be found in the paper: Effect of Sodium Chloride, pH and Temperature on Growth of Shigella flexneri13. Parameters in our situation do not perfectly fit the calculated ones, but they do fall in a certain range:

  • Lower limit:
    , , ,
    , , ,

  • Higher limit:
    , , ,
    , , ,

Since our situation is in this range, so we can have an upper limit and lower limit for the growth scenario of Shigella (Figure 2).

Figure 2 | Upper limit and lower limit for Shigella's natural growth in jejunal. The x-axis indicates the time in the unit of hour; the y-axis indicates the log count of the number of Shigella.

The survival rate of yeast cells in jejunal varies with time (Figure 3)14.

Figure 3 | Survival rate of yeast cells in jejunal with the time of digestion.

The survival rate is highest at , when of intake yeast cells are viable. Then, the viability decreases with time, finally of intake yeast cells are viable at .

Part 3: Medication guidance

Our drug is designed to 'kill' the Shigella before it invades the surface of the jejunal, or more realistically, to decrease the concentration of floating Shigella so that a mass invasion could be avoided or stopped as soon as possible. Based on the information from the wet lab and math modeling, we would like to give a possible conclusion or medication guidance for the patients and the managers of the healthcare system, so that our vision could be realized.

For a single micro-capsule, assume it contains at least 10 billion CFU (colony-forming unit)16, then it contains at least 10 billion live yeast cells. Since the expression rate is about 31.25%, there would be at least 3 billion live yeast cells that expressed at least one 20ipaD on each of their surfaces. After the capsule reaches jejunal, it will dissolve and start to release yeast cells. After 2 hours, there will be about 0.9 billion yeast cells (30.8% of the intake) that are viable to display 20ipaD and bind with IpaD.

For Shigella, since the upper limit and lower limit of natural growth are found, we can calculate the maximum population density in the following form:

So the upper limit of the max density at will be about 5 billion Shigella, and the lower limit will be about 1 billion Shigella.

Therefore, it is proper to take 5-6 capsules every 5 hours after the patient start to have symptoms, and distribute 1-5 capsules every 5 hours to people who might develop symptoms due to living together or common lifestyles.

Summary & Future plan

To conclude, we have described the 20ipaD expression efficiency of yeast cell through wet-lab results and predicted the post-medication scenario through math modeling. The expression efficiency is found to be about 31.25%± 0.75, and the upper limit and lower limit of natural growth of Shigella could help us to define a range for Shigella to grow naturally in jejunal. By assuming each capsule contains 10 billion CFU, we are able to give medication guidance for symptomatic patients to take 5-6 capsules per 5 hours and for potential patients to take 1-5 capsules per 5 hours.

In the future, more work can be devoted to deciding the binding efficiency between yeast-20ipaD and IpaD, how much CFU is needed and realistic can a single capsule contain. This work can help us better understand the change in number of Shigella after medication, so that more precise medication guidance is available.

12

Salari R, Salari R.(2017).Investigation of the Best Saccharomyces cerevisiae Growth Condition. Electron Physician. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5308499/

13

Zaika, L. L., Engel, L. S., Kim, A. H., & Palumbo, S. A. (1989). Effect of Sodium Chloride, pH and Temperature on Growth of Shigella flexneri. Journal of food protection, 52(5), 356–359.https://pubmed.ncbi.nlm.nih.gov/31003269/

14

Etienne-Mesmin L, Livrelli V, Privat M, Denis S, Cardot JM, Alric M, Blanquet-Diot S.(2011). Effect of a New Probiotic Saccharomyces cerevisiae Strain on Survival of Escherichia Coli O157:H7 in a Dynamic Gastrointestinal Model. Applied and Environmental Microbiology. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3028742/

15

Fordtran JS, Rector FC Jr, Carter NW.(1968) The mechanisms of sodium absorption in the human small intestine. J Clin Invest. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC297237/

16

Choosing the Best Probiotic: How Many CFUs is Enough?. (2022). Deerland. https://deerland.com/chew/choosing-best-probiotic-many-cfus-enough/

Hardware

Probiotics will be sealed in microcapsules(Figure 1) made of shellac before being delivered into bodies. The pH-dependent solubility that prefers a mild basic environment makes shellac perfect for releasing probiotics in the intestines.

Figure 1 | 3D model of the microcapsule

Microcapsule is a microscope-scale capsulation format. The probiotics were wrapped in materials that can ensure the reservation, activation, release, and in vivo integrity of the probiotics. It would be presented as powder. "Microencapsulation is a technology that serves as a tool to protect the sensitive and expensive nutrients (Meyers et al., 1998), by providing them with a protective wall, which allows them to get released at a particular site, at a particular time, and under particular conditions."17 To summarize, the technology of microencapsulation is to ensure the safe delivery of the sensitive compound.

Shellac is a natural polymer that is generally recognized as a safe (GRAS) material. It is insoluble in neutral and acidic aqueous solutions but dissolves at alkaline conditions 18. So, shellac is proper for the controlled release of yeast cells in the intestine.

Research has shown that the microcapsules with the addition of shellac "contributed to better probiotic survivals after freeze drying, simulated digestion, heating and ambient storage, and whey protein isolate (WPI) addition had a synergistic effect."19

17

Choudhury, N., Meghwal, M., & Das, K. (2021). Microencapsulation: An overview on concepts, methods, properties and applications in foods. Food Frontiers, 2, 426– 442. https://doi.org/10.1002/fft2.94

19

Huang, X., Gänzle, M., Zhang, H., Zhao, M., Fang, Y. and Nishinari, K. (2021), Microencapsulation of probiotic lactobacilli with shellac as moisture barrier and to allow controlled release. J Sci Food Agric, 101: 726-734. https://doi.org/10.1002/jsfa.10685

18

J. Barbosa, S. Borges, M. Amorim, M.J. Pereira, A. Oliveira, M.E. Pintado, P. Teixeira.(2015). Comparison of spray drying, freeze drying and convective hot air drying for the production of a probiotic orange powder. J Funct Foods. https://doi.org/10.1016/j.jff.2015.06.001