Engineering Success
Introduction
As explained, the algorithm we propose suggests ways to improve soil quality. One Machine Learning model is selected after the training phase. After the selection of the best model using the default parameters, hyperparameter tuning is applied iteratively and leads to the repetition of the design and build phase of the model.
For the sake of hyperparameter tuning, the concept of pipelines is used to assemble several steps that can be cross-validated together while setting different parameters. Pipeline is a utility that provides a way to automate a machine learning workflow, giving the ability to sequentially apply a list of parameters.
This part of the algorithm aims at finding the combination of hyperparameters that maximizes the model's performance during the test phase while preventing overfitting. As an evaluation metric, accuracy is selected similarly to the training phase. It's also written in the programming language R since it provides some of the best tools and library packages to work with machine learning projects.
After the training phase, knn was evaluated as the best model from the training.
Design/Build
During the design phase, different design choices can be made in order to improve the performance of the model. Two of the most common are the preprocessing method and the hyperparameters.
-
Change preprocessing method
-
Change hyperparameters
-
The value of K
-
The distance metric
-
Weights
Standardization
During the training phase, "center" and "scale" methods were used. "center" subtracts the mean of the predictor's data from the predictor values while "scale" divides by the standard deviation. This preprocessing technique is known as standardization and is a common requirement for many machine learning estimators. It transforms the data to Gaussian with zero mean and unit variance1.
Normalization
The method tested during these phases is the "range" transformation. Also known as Normalization, it is a process of scaling individual samples to have a unit norm; values are shifted and rescaled so that they end up ranging between 0 and 1. This technique is known to help in the training process of multiple models when input features have different ranges.
For example, in our case, meteorological data (0-40) of the field and the results of the soil analysis from the samples (0-5) taken, are of different ranges. So, these two features are in very different ranges. When we do analysis using Machine Learning, the first will intrinsically influence the result more due to its larger value. But this doesn't necessarily mean it is more important as a predictor. So we normalize the data to bring all the variables to the same range.
We use the Exhaustive Grid Search technique for hyperparameter optimization. An exhaustive grid search takes the desired hyperparameters and tries every single possible combination of the hyperparameters as well as as many cross-validations as you would like to perform. It is a good way to determine the best hyperparameter values to use, but it can quickly become time-consuming with every additional parameter value and cross-validation that you add.
The first step is to determine the value of K. The determination of the K value varies greatly depending on the case. Finding the value of k is not easy. A small value of k means that noise will have a higher influence on the result making it computationally expensive.
To calculate distances, 3 distance metrics that are often used are Euclidean, Manhattan, and Minkowski2.
Adding weights to the data points might be beneficial to the model or not. 'uniform' assigns no weight, while 'distance' weighs points by the inverse of their distances (i.e., nearer points will have more weight than the farther points).
To build the code for hyperparameter tuning, language R is used since it provides some of the best tools and library packages to work write the machine learning training scripts.
Test/Learn
Following the engineering design cycle, test is the next step. Accuracy is used as the metric to determine whether the model is improved, as during the training phase. Machine Learning models, and especially deep neural network architectures, have the ability to generalize well to previously unseen data3. Estimation of a variable's value beyond the initial observation range, based on its relationship with another variable, is called extrapolation. Even though our training set consists only of non-genetically modified organisms (non-GMOs), the testing process using Genetically modified organisms (GMOs) can be considered meaningful, since the inputs (physicochemical characteristics, agronomical characteristics and data derived from metagenomic analysis) and outputs (soil quality improvement method) of the algorithm still have the same format. Genetically modified organisms (GMOs) are living organisms whose genome has been engineered in the laboratory in order to favour the expression of desired physiological traits or the generation of desired biological products4.
The last step is the process wherein we try to understand the predictions of the machine learning model, known as interpretation. Interpretation of our model is useful (i) as a debugging tool in order to explain the model's decisions and thus improve it, but also (ii) gives us enough confidence to use these models in the real world.
In the case of our model, knn, whether the model is interpretable depends solely on whether someone can interpret a single instance of the dataset. Our instances consist of dozens of features, and this makes it impossible to be directly interpreted. Techniques for reducing the features, keeping only the most important, exist with the most common being Principal component analysis (PCA)5. Using PCA and extracting two or three principal components, good explanations can be derived from visualizations. That enables us to learn from the process and repeat the engineering cycle with a better model.
GMOs
The future applications of Synthetic Biology, using genetically engineered microorganisms, go beyond the confines of a laboratory, such as the applications in agriculture. Therefore, the issue of biological containment becomes increasingly important6.
The rapid development of GMOs raises a number of biosecurity concerns related to environmental escape, their stability, resilience, performance and the potential exchange of transgenic DNA with native organisms in the ecosystem7.
It is not possible to predict or completely control a GMO outside the laboratory, because the microorganism changes in order to adapt to these new circumstances, endure competition and other types of abiotic and biotic stresses from the environment7. GMOs can also exchange genetic material efficiently between each other, which could be hazardous to human health and the environment.
Targeted intervention in microorganisms could result in even more targeted soil improvements. Thus, the use of GMOs could be a future development of our project. Moreover, this approach could reduce even further the use of fertilizers and significantly improve the properties in regards to the agronomic value of the final product, such as the increased yield. In this case, risk assessment would be necessary according to the EFSA Panel on Genetically Modified Organisms (GMO)8, because GMOs could have adverse impacts on the environment, human and animal health6. Nutritional assessment is also needed in order to ensure food safety and define the nutritional value according to the World Health Organization (WHO)9.
We would like to point out some properties of Bacillus subtilis that could be optimized in the future. Bacillus subtilis is a Plant Growth Promoting Bacteria (PGPB) and it has many applications in the field of agriculture, due to its properties. More specifically, it is able to improve nutrient availability, control plant pathogens, reduce abiotic stresses, increase plant defenses by triggering induced systemic resistance, produce a variety of compounds with antimicrobial properties10, alter plant growth hormone homeostasis and increase plant tolerance against both drought and salt stress11. Therefore, Bacillus subtilis is considered as an environmentally friendly and less expensive alternative solution in comparison to chemical fertilizers. It is very important to point out that not every strain of B. subtilis has the same features12.
B. subtilis has an important role in enhancing the availability of three trace elements13 B. subtilis fixes the atmospheric nitrogen, promotes the production of it by other bacteria and improves the colonization of other native symbiotic rhizobacteria14. In addition, B. subtilis produces various organic acids that solubilize phosphorus into an accessible form for plants15. In this way, the soil fertility increases. B. subtilis also causes an upregulation in plant genes that are responsible for iron acquisition, and it increases the mobility of the element due to the acidification of the rhizosphere resulting in higher levels of iron in plants1617.
It is worth emphasizing that B. subtilis can affect plants in direct and indirect ways. B. subtilis can alter the levels of plant growth hormones by producing them or by inducing the production of them in plants by the secretion of other substances18.
A compound that has an important role in plant growth is auxin, which enhances root development, but inhibits leaf expansion. According to studies, B. subtilis strains cause higher levels of auxin in roots and less in leaves resulting in plant growth19. B. subtilis induces plant growth through the production of a polyamine called spermidine20 and some B. subtilis strains produce cytokinin, which has an important role in plant growth and yield18.
A strain of B. subtilis turned out to reduce the damage that is caused due to drought and salt stress and it was able to improve resistance in those conditions. B. subtilis GOT9 improves the tolerance to the A. thaliana and Brassica campestris, through the alteration of genes, especially the gene of abscisic acid (ABA), which is very important in regulation of stress21.
Another compound that B. subtilis produces is Surfactin, which is a cyclic lipopeptide with amphiphilic properties that participates in signal transmitting and reduction of surface tension and regulating the biosynthesis of fatty acids and carbon metabolism22. Furthermore, B. subtilis can produce exoenzymes like proteases and chitinases that disrupt the cell wall of fungi23.
In addition, B. subtilis is able to reduce the virulency of pathogens24 due to its involvement in quorum sensing (QS) signals25 and specifically produces an enzyme AiiA that inhibits QS autoinducers2624.
To ensure the containment of the bacteria from the environment, we could implement our bacterial interface with a kill-switch. A kill-switch is something that tells a bacterium whether to live or die, depending on specific environmental conditions27.
There are some synthetic biology-mediated biocontainment strategies that have been developed, but they are difficult to implement, because the bacteria can easily mutate their DNA in order to escape biocontainment. They should also be evaluated within the context of an industrial process, because they may be economically unviable7.
Metabolic auxotrophy is mentioned as the genetic knockout of a gene related to the synthesis of a necessary metabolite. The lack of presence of this metabolite, which is essential for the growth of the bacteria, leads to bacterial cell death. This approach is effective for biocontainment in the laboratory, but tends to fail outside of strict laboratory conditions7.
Another approach is the inducible control of systems deleterious to cell health. Under certain circumstances, the toxic gene expression leads to cell death7.
In toxin-antitoxin systems, when the toxin is present in higher concentrations than the antitoxin, the bacterial cell is killed. More specifically, there are two genes that encode the toxin and the corresponding antitoxin. In permissive conditions the toxin is repressed. When the environment changes, the toxin expression increases and the low level of antitoxin expression can not prevent the lethal level of free toxin28 27.
The "Deadman" and "Passcode" systems were designed for Escherichia coli, but can be altered to fit the desired molecules of choice29. The "Deadman" system uses a continuous stimulus from the environment to repress a toxin. When this stimulus disappears, the repression is removed, so the strain will switch to toxin production in order to reduce cell viability. The "Passcode" system uses transcriptional repression. In this approach multiple stimuli are needed to repress the toxin and another stimulus needs to be absent. Otherwise, the toxin will be expressed.
Another strategy is the non-canonical amino acids (ncAA) approach. A rare codon is first reassigned to a synonymous codon to create an unused codon in the genome. This codon can then be used for the incorporation of a non-standard amino acid into essential genes30. Recording can create a synthetic auxotroph. This approach has significant biocontainment advantages in the reduction of the implications of horizontal gene transfer (HGT) but has not been evaluated enough7.
There is also the multi-layered kill switch, which is stable for at least 110 generations but requires external supplementation of survival factors31.
It is worth emphasizing the fact that any project requires different modes of action and a multilayered approach to prevent the escape of GMOs into the environment and ensure enhanced safety and security measures.
We have also seen Bacillus subtilis kill switches designed by past iGEM teams. Team TU Darmstadt32 has not only inspired us to improve our project, but also encouraged us to further study the different aspects of a kill switch design and implementation.
References
-
RDocumentation: Pre-Processing of Predictors.
-
A Performance Comparison of Euclidean, Manhattan and Minkowski Distances in K-Means Clustering.
-
Kawaguchi, K., Kaelbling, L. P., & Bengio, Y. (2017). Generalization in deep learning. arXiv preprint arXiv:1710.05468.
-
Fridovich-Keil, Judith L. and Diaz, Julia M..
-
Jolliffe, Ian T., and Jorge Cadima. "Principal component analysis: a review and recent developments." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374.2065 (2016): 20150202.
-
Kawall, K., Cotter, J., & Then, C. (2020). Broadening the GMO risk assessment in the EU for genome editing technologies in agriculture. Environmental Sciences Europe, 32(1), 1-24.
-
Arnolds, K. L., Dahlin, L. R., Ding, L., Wu, C., Yu, J., Xiong, W., ... & Guarnieri, M. T. (2021). Biotechnology for secure biocontainment designs in an emerging bioeconomy.
-
EFSA Panel on Genetically Modified Organisms (GMO). (2011). Guidance on the risk assessment of genetically modified microorganisms and their products intended for food and feed use. EFSA Journal, 9(6), 2193. doi:10.2903/j.efsa.2011.2193.
-
World Health Organization. (2009). Foods derived from modern biotechnology. Foods derived from modern biotechnology., (Ed. 2).
-
Wang, T., Liang, Y., Wu, M., Chen, Z., Lin, J., & Yang, L. (2015). Natural products from Bacillus subtilis with antimicrobial properties. Chinese Journal of Chemical Engineering, 23(4), 744-754.
-
Carvalho, F. P. (2017). Pesticides, environment, and food safety. Food and energy security, 6(2), 48-60.
-
Mukherjee, A. K., & Das, K. (2005). Correlation between diverse cyclic lipopeptides production and regulation of growth and substrate utilization by Bacillus subtilis strains in a particular habitat. FEMS Microbiology Ecology, 54(3), 479-489.
-
Hayat, R., Ali, S., Amara, U., Khalid, R., & Ahmed, I. (2010). Soil beneficial bacteria and their role in plant growth promotion: a review. Annals of microbiology, 60(4), 579-598.
-
Elkoca, E., Kantar, F., & Sahin, F. (2007). Influence of nitrogen fixing and phosphorus solubilizing bacteria on the nodulation, plant growth, and yield of chickpea. Journal of Plant nutrition, 31(1), 157-171.
-
Saeid, A., Prochownik, E., & Dobrowolska-Iwanek, J. (2018). Phosphorus Solubilization by Bacillus Species. Molecules, 23(11), 2897.
-
Freitas, M. A., Medeiros, F. H., Carvalho, S. P., Guilherme, L. R., Teixeira, W. D., Zhang, H., & Paré, P. W. (2015). Augmenting iron accumulation in cassava by the beneficial soil bacterium Bacillus subtilis (GBO3). Frontiers in Plant Science, 6, 596.
-
Zhang, H., Sun, Y., Xie, X., Kim, M. S., Dowd, S. E., & Paré, P. W.(2009). A soil bacterium regulates plant acquisition of iron via deficiency‐inducible mechanisms. The Plant Journal, 58(4), 568-577.
-
Arkhipova, T. N., Veselov, S. U., Melentiev, A. I., Martynenko, E. V., & Kudoyarova, G. R. (2005). Ability of bacterium Bacillus subtilis to produce cytokinins and to influence the growth and endogenous hormone content of lettuce plants. Plant and Soil, 272(1), 201-209.
-
Zhang, H., Kim, M. S., Krishnamachari, V., Payton, P., Sun, Y., Grimson, M., ... & Paré, P. W. (2007). Rhizobacterial volatile emissions regulate auxin homeostasis and cell expansion in Arabidopsis. Planta, 226(4), 839-851.
-
Xie, S. S., Wu, H. J., Zang, H. Y., Wu, L. M., Zhu, Q. Q., & Gao, X. W. (2014). Plant growth promotion by spermidine-producing Bacillus subtilis OKB105. Molecular Plant-Microbe Interactions, 27(7), 655-663.
-
Woo, O. G., Kim, H., Kim, J. S., Keum, H. L., Lee, K. C., Sul, W. J., & Lee, J. H. (2020). Bacillus subtilis strain GOT9 confers enhanced tolerance to drought and salt stresses in Arabidopsis thaliana and Brassica campestris. Plant Physiology and Biochemistry, 148, 359-367.
-
Sansinenea, E., & Ortiz, A. (2011). Secondary metabolites of soil Bacillus spp. Biotechnology letters, 33(8), 1523-1538.
-
Yan, L., Jing, T., Yujun, Y., Bin, L. I., Hui, L. I., & Chun, L. I. (2011). Biocontrol efficiency of Bacillus subtilis SL-13 and characterization of an antifungal chitinase. Chinese Journal of Chemical Engineering, 19(1), 128-134.
-
Pan, J., Huang, T., Yao, F., Huang, Z., Powell, C. A., Qiu, S., & Guan, X. (2008). Expression and characterization of aiiA gene from Bacillus subtilis BS-1. Microbiological research, 163(6), 711-716.
-
Helman, Y., & Chernin, L. (2015). Silencing the mob: disrupting quorum sensing as a means to fight plant disease. Molecular plant pathology, 16(3), 316-329.
-
Dong, Y. H., Xu, J. L., Li, X. Z., & Zhang, L. H. (2000). AiiA, an enzyme that inactivates the acylhomoserine lactone quorum-sensing signal and attenuates the virulence of Erwinia carotovora. Proceedings of the National Academy of Sciences, 97(7), 3526-3531.
-
Stirling, F., Bitzan, L., O'Keefe, S., Redfield, E., Oliver, J. W., Way, J., & Silver, P. A. (2017). Rational design of evolutionarily stable microbial kill switches. Molecular cell, 68(4), 686-697.
-
Engelberg-Kulka, H., Amitai, S., Kolodkin-Gal, I., & Hazan, R. (2006). Bacterial programmed cell death and multicellular behavior in bacteria. PLoS genetics, 2(10), e135.
-
Chan, C. T., Lee, J. W., Cameron, D. E., Bashor, C. J., & Collins, J. J. (2016). 'Deadman 'and' Passcode'microbial kill switches for bacterial containment. Nature chemical biology, 12(2), 82-86.
-
Rovner, A. J., Haimovich, A. D., Katz, S. R., Li, Z., Grome, M. W., Gassaway, B. M., ... & Isaacs, F. J. (2015). Recoded organisms engineered to depend on synthetic amino acids. Nature, 518(7537), 89-93.
-
Gallagher, R. R., Patel, J. R., Interiano, A. L., Rovner, A. J., & Isaacs, F. J. (2015). Multilayered genetic safeguards limit growth of microorganisms to defined environments. Nucleic acids research, 43(3), 1945-1954.
-
iGEM TU Darmstadt 2020 - Kill_Switch