Designing the most optimal condition for the cell-free system
The Cell-free System or CFS is an in vitro system consisting of cellular abstractions extracted from lysed cells and other elements necessary for the target protein synthesis[1]. Even though CFS is well known for its reduced time for protein expression, small reaction volume, and relatively lower cost, many laboratories have been optimizing their CFS to achieve even more efficient and robust lab-made cell-free systems[2]. Our detection kit must be quick, robust, and affordable. Thus, we have decided to optimize our CFS with our own hands.
Our CFS is a set of five different components: Master Mix, Amino Acids, Cell Extract, Plasmid Construct, and Extra Components (RNA Inhibitor and Maltose). The objective of the optimization process was to identify the optimal concentration combination of four essential reagents of the Master Mix, including ATP, Mg-Acetate, 3-PGA, and PEG.
The first phase was the optimization of a single reagent. The GFP expression levels were measured at five different concentration points of a single reagent while keeping the concentrations of other reagents constant. Followings are the sets of reagent concentrations used in our experiments.
The optimization experiments were conducted according to the following procedure.
Here are the results for a single reagent optimization experiment.
Figure 1 Single reagent optimization: ATP
Figure 2 Single reagent optimization: Mg
Figure 3 Single reagent optimization: 3-PGA
Figure 4 Single reagent optimization: PEG
The next phase of optimization was to investigate the non-linearity in our CFPS. This was executed by the employment of the deep learning model entitled Recurrent Neural Network or RNN. A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.
Retaining accurate and abundant datasets are the keys to the development of a reliable deep learning model. Since it is impossible to find an open source dataset that is specific enough for our CFPS, we decided to generate an in-house dataset. Following is how we planned to prepare our dataset.
Given that is the optimal concentration of a particular reagent obtained in the Phase 1,
Every possible combination and their corresponding reagent concentrations were calculated using the python program.
Unfortunately, we were not able to obtain the dataset due to limited time and resources. Yet, the design of model structure and model training method will be explained in the following section. Dummy data was created to replace the experimental data. Arbitrary optimal concentration set was decided:
For the generation of each datapoint, random concentration value was assigned to each reagent using the randint function. Then, the expression level of GFP for a particular set of concentrations was assigned according to the set’s distance, d, and the number of near optimal concentration reagent, n. The d and n of each set was defined as below:
where and are the optimal concentration and current concentration of each reagent. Dummy dataset containing total 10,000 data points were created and fed into the training model.
The structure and detailed information about our dataset are provided in this section.
Figure 5 Dataset Overview
Unfortunately, the desired trend between the reagent concentration set and the GFP expression level were not observed in our dummy dataset. As a result, the loss value of the model was not decreasing in a meaningful way.
Figure 6 Loss value after each training
Despite the unforeseen results from our dummy dataset, past research [2] that implemented deep-learning algorithm to muti-reagent optimization was able to increase the CFS protein expression level by more than three times. Currently, there are more accurate and efficient data processing and machine/deep-learning algorithms. With the state-of-the-art algorithms and real experimental dataset, we expect a similar result could have been drawn for our CFS optimization process. Furthermore, the true value of machine/deep learning-adapted optimization lies in that it allows to not just capture the hidden mathematical trend in the CFS, but it also to predict the results for un-experimented region based on the mathematical trend. Although we were not able to complete the multi-reagent optimization in this project, our effort here may suggest new inspiration or a basepoint for the future iGEM teams. Future projects that involve CFS with wider spectrum of reagent concentrations may especially benefit from our work. Such projects will be restrained to explore a small portion of their possible domain of input. Yet, they will be able to obtain better insight about the general trend of their entire CFS from significantly reduced amount of time and resources that must be placed in to obtain an experimental dataset. In addition, they will be able to conduct more detailed analysis and draw more precise conclusion about their CFS optimization in a more efficient way.
[1] Batista, A. C., Soudier, P., Kushwaha, M., & Faulon, J.-L. (2021, February 21). Optimising protein synthesis in cell‐free systems, a Review. The Institute of Engineering and Technology. Retrieved October 8, 2022, from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/enb2.12004
[2] Caschera, F., Bedau, M. A., Buchanan, A., Cawse, J., Nucrezia, D. de, Gazzola, G., Hanczyc, M. M., & Packard, N. H. (2011, April 21). Coping with complexity: Machine learning optimization of cell‐free ... Wiley Online Library. Retrieved October 8, 2022, from https://onlinelibrary.wiley.com/doi/10.1002/bit.23178
[3] Zhang, L., Lin, X., Wang, T., Guo, W., & Lu, Y. (2021, July 6). Development and comparison of cell-free protein synthesis systems derived from typical bacterial chassis - bioresources and Bioprocessing. SpringerOpen. Retrieved October 8, 2022, from https://bioresourcesbioprocessing.springeropen.com/articles/10.1186/s40643-021-00413-2