This project plans to display chitosanase on the surface of E. coli cells for hydrolyzing chitosan. The higher the chitosanase content on the cell surface, the more helpful it is to hydrolyze chitosan. The chitosanase gene on the plasmid we constructed in E. coli needs to come from other species, and chitosanase also forms a fusion protein with InaK-N. in addition, we hope that the chitosanase from other species can form a fusion protein with higher content which suitable for expression in E. coli. Therefore, the species source of the chitosanase gene is very important.
Use modeling to select proper chitosanase which is suitable for expressing by cell surface display on E. coli.
Some literature has reported the relationship between multiple mRNA sequences and the corresponding protein yield. We use a machine learning model to establish the corresponding relationship between sequence information and its target protein yield (Figure 2). Using sequence features related to stability and translation efficiency (minimum free energy CAI, etc.), the one-to-one correspondence between the sequence and its protein yield was trained by a random forest regression model (Figure 3).
Mean Absolute Error: 6.004875, Root Mean Squared Error: 7.646747, the error difference between Root Mean Squared Error and experiment is less than 2%. The accuracy of the model is verified.
At present, we know that many species contain Chitosanase genes, including Bacillus amyloliquefaciens, Linderina pennispora, Bacillus thuringiensis, etc., using models to predict the yield of corresponding fusion proteins:
Species | Predicted expression |
Bacillus amyloliquefaciens | 93.102 |
Bacillus sonorensis | 94.018 |
Bacillus halotolerans | 93.992 |
Streptomyces olivaceus | 94.524 |
Linderina pennispora | 94.402 |
Bacillus thuringiensis | 97.55 |
The results showed that the yield of the fusion protein corresponding to Chitosanase from different species was different, and the yield of the fusion protein corresponding to Chitosanase from Bacillus thuringiensis was the highest. Therefore, the Chitosanase gene sequence from Bacillus thuringiensis was considered in this project.
In this model, the machine learning method is used to predict the yield of the target protein in order to assist in the selection of a suitable Chitosanase gene sequence. The prediction results of the model showed that the Chitosanase gene from Bacillus thuringiensis formed a fusion protein of InaK-Chitosanase with a high yield, which could hydrolyze chitosan more efficiently. However, the disadvantage is that InaK-Chitosanase is displayed on the cell surface for hydrolyzing chitosan, and the process of fusion protein presentation to the cell surface has not been considered in this model. But the results of this model still can be used as a reference for the selection of the Chitosanase gene.
Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics
Insights into promiscuous chitosanases: the known and the unknown
Surface Immobilization of Human Arginase-1 with an Engineered Ice Nucleation Protein Display System in E. coli