Our team, Korea_HS, chose “Data Security in DNA Data Storage” as our main theme. We researched and tried various experiments to prove how the TFAM-DNA complex in an aqueous solution can stabilize the DNA in DNA data storage. With this idea, the Human Practices team aimed to expand the knowledge of DNA data storage, data security, and synthetic biology to the population, approaching them with various public engagements.
As our team needed further advise on how to create a TFAM-DNA complex to improve DNA data storage, we gained advice from expertises in our project. We asked various stakeholders from different companies/majors to gain expert knowledge on our project and their opinions on it. We interview 5 different stakeholders-Seongjun Yoon (CEO of Fortuga Bio), Jaehan Park (General Manager of data storage at Samsung), Sungjun Lee (Member of Software Company), Teagsang Yoo (Worker at SK biopharmaceuticals), and Yoseb Song (MIT postdoc)- hoping to gain valuable information that can help strengthen our project. To summarize the information we learnt through the interviews, we put together documents and shared it to high school students for further education.
Our first interviewee was Sungjun Yoon, the CEO of Fortuga Bio. We hoped to gain foundational expert knowledge on the impact of DNA data storage on the world and decided that as Fortuga Bio is a research-driven biotechnology company specializing in various therapeutic cancer treatments, we could obtain a clearer understanding of how DNA data storage could impact companies specifically. Through this interview, our team gained knowledge on paralleled, fundamental issues between DNA-based data storage and the conventional methods and DNA-based data storage’s setbacks of commercialization, economic and spatial advantages, and its possibility for the creation of a new field of study and widespread technology.
CEO Yoon first underscored the presence of ethical concerns and data security in both conventional data storage and DNA-based data storage. He explained that as both storage methods shared the same basic technological support, DNA-based data storages were still susceptible to data leaks. Through this information, we were able to understand areas and ways to improve our product.
Further, CEO Yoon addressed the current limits to DNA-based data storage commercialization as currently, there are no companies that use DNA storage as their major source of storage. However, he shared his positive outlook for the future and widespread utilization by mentioning some companies that utilize this technology in a smaller manner and are undergoing continuous research and development of technology.
CEO Yoon then explained how the practical and spatial advantages of DNA-based data storage might impact companies. He predicted that for companies associated with handling big data, DNA-based data storages’ greater stability through a prolonged period of safe storage time, smaller required density, and thus, less necessary space, and facilitated copying method through PCR compared to the conventional method would make the technology financially appealing. Such financial efficiency, CEO Yoon explained, was a strong implication of future commercialization of the novel technology.
For companies that do not handle such data and for the general public, CEO Yoon speculated that they would be affected by the development of a new field and thus, the creation of new industries, diversification of workers, and increased public data engagement. CEO Yoon speculated that if DNA-based data storages are commercialized to the public, the Cloud methods of saving data on the web today could change drastically, expecting the demand for companies that rent and operate servers to decrease while promoting the creation of new ones with specialization on DNA-based data storage and its commercial usages. Moreover, CEO Yoon commented that the workers working in traditional storage fields would have the freedom to remain or learn handling of new technology, diversifying the workplace and increasing employment opportunities for both the new generation and existing workers. And, through the commercialization of such technology, he envisioned a better educated and engaged public in the field of data storage.
Inspired by the newly-gained knowledge of contemporary limitations, we sought further understanding of the current situation of DNA-based data storage and its applications and implications for the most likely users – larger data companies – in our next interview with Jaehan Park, the General Manager for data storage and network at Samsung Co.
To gain an understanding of how the current data storage system functions and how DNA data storage can be implemented in larger corporations, Korea_HS team members sought advice from JaeHan Park, a General Manager that oversees data storage and network at Samsung T&C. Overall, Korea_HS members enjoyed an informative and helpful discussion with Jaehan Park on the increasing needs for data storage, current data storage methods, and their limitations.
He first highlighted that the current data storage methods are difficult to retrieve data after time has passed. He further mentioned that DNA-based data storage might not be widely used by companies due to its lag in retrieving the data, and converting it from a DNA sequence to binary data. In order to allow protein folding, we decided to put forth the idea of cleaving certain amino acids that were not necessary for our TFAM protein design, which was effectively implemented as seen through our Integrated Human Practices.
Furthermore, Mr. Park addressed how current data storage methods contribute to climate change and pollution. He mentioned that as hard disks do not gradually biodegrade, the disposal of hard disks contaminates the environment. Therefore, current data storage methods are disposed of through professional disposal companies which also adds to the cost.
When asked about future applications of DNA-based data storage, Mr. Park stated that more cost and space-efficient data storage mediums are demanded especially with new technology being developed such as Artificial Intelligence and big data. It was also brought up that data storage is needed in almost every aspect of daily life (e.g. hospitals, and companies). If DNA-based data storage is commercialized officially, Mr. Park predicts that it could be applied in a wide range of fields especially considering its endurance to extreme conditions.
As we have gained knowledge on the limitations, the needs, and current methods through this interview, we wanted to acquire more knowledge and grasp a better understanding of how implementation works on DNA data storage. In our next interview with Sungjun Lee, a member of SAP, we learn the possible implementations of DNA data storage.
To gain further insight into how DNA data storage can be implemented, we conducted an interview with Sungjun Lee, an employee at SAP, a software company in South Korea. As part of the data storage department, he was able to tell us about the possible implementations and limitations of DNA data storage.
When asked about the benefits of DNA data storage compared to the current data storage techniques, Mr.Lee confidently stated that using DNA to store data with TFAM ensuring the stability of the DNA has the potential to develop as the main medium of data storage. He highlighted the effectiveness of DNA data storage in storing long-term data. However, due to the complicated retrieval process compared to the current data storage model, he stated that DNA data storage would face some limitations in replacing HDDs and SSDs in the short run for OLTP-required services. Despite the limitations, he remarked that DNA data storage models could be implemented in large cloud service providers such as the AWS or GCP if DNA data storage provides a better mechanism for retrieval and stability.
With further research and through Mr.Lee’s interview, we have noticed that data storage centers are largely exposed to UV radiation from sunlight and H2O2 from the air. DNA, when exposed to those stress factors, can alter its nucleotide sequences, damaging the data within the DNA. In order to tackle the issue of stability in real conditions questioned by Mr.Lee, we have conducted a series of extensive stress tests. These stress tests allowed us to prove that the TFAM-DNA complex can withstand the stress factors and secure the data in the DNA even in extreme conditions.
Accomplishing these stress factors in extreme conditions, we were able to secure data in the DNA storage. Through this, we wanted to receive advice on the general application of our project. We interviewed Teasang Yoo, a worker at SK biopharmaceuticals, to provide advice and comments on our product.
Towards the end of our project, we approached Teagsang Yoo, a worker at SK biopharmaceuticals, to seek advice on the general application of our product. Mr. Yoo emphasized that if the DNA-based data storage systems become applicable on a large scale, it’ll not only allow a more efficient use of space, but also make positive contributions to the environment. However, he stated that until further research, due to the financial costs required to develop a DNA based data storage center, we cannot confirm that DNA based data storages can be used in large scale companies.
In order to solve the financial matter, we’ve done more research to compare the costs of the current data storage centers and the costs required for DNA based data storage centers. According to Forbes, the cost of a human whole genome sequencing has dropped to $600 today, and the company providing DNA based data storage has an estimated revenue of about a billion dollars. They claimed that because DNA data storage centers don’t require extensive conditions to maintain the stability of the data, the monetary benefits gained through the shift in storage outweighs the annual costs required to maintain the storage center. This answered the question posed by Teagsang Yoo on the financial costs of DNA-based data storage centers, allowing us to reassure the cost-effectiveness of our product.
To gain valuable insight on the theoretical part of our project, we interviewed Yoseb Song, a postdoc at Massachusetts Institute of Technology (MIT). Dr. Song provided valuable insight regarding the principles of our project and the theory behind it. We received opinions of other fellow researchers of Dr. Song regarding synthetic biology and also DNA-based data storage.
Dr. Song explained how the field of DNA-based data storage has developed. The concept of storing binary data in DNA has existed for some time, but the only occasion when it was actually applied in real life was roughly 2 years ago. Based on this, Dr. Song further addressed the potential of DNA based data storage especially considering the limitations of current data storage methods.
Current data storage methods such as hard disks store data in the form of binary codes, while DNA uses a quaternary form utilizing the four nucleotide bases: adenine, thymine, guanine, and cytosine. Additionally, considering the degradation rate leads to current data storage mediums being incapable of storing data for substantial amounts of time. He contrasted this fact to what DNA-based data storage is capable of: storing data for numerous decades while remaining stable.
Through the interview with Mr. Jaehan Park, our team learned the significance of the correct disposal method, and how they can impact our world if data storages are disposed of incorrectly. Thus, we are creating a DNA disposal security booklet that contains information on how biohazards can be disposed of properly by integrating the stakeholders’ feedback and our own research. Information in this 12-page booklet was divided into six main parts: “What is Biohazardous Waste”; “Environmental Issues”; “Prevention Methods”; “Specific Guidelines”; “Disposal Outside of Lab”; “Conclusion: Future Outlook.” Within, specific topics from wildlife and human harm to how DNA disposals may change with commercialization were discussed. Our team anticipates that this booklet will be able to improve our project by the means of promoting ethical and safe decisions that could positively affect the stakeholders.
Through the multiple interviews conducted with experts in the field of synthetic biology and DNA-based data storage, our team was able to deepen our understanding of our own project and how it can be further improved to impact the world positively. From the interview with the experts, our team was able to learn the following limitation of our project: increased complexity in the data retrieval process caused by the lack of stability of DNA data storages and the extra step of converting the DNA sequence code into binary code. This instability limited the technology’s utilization in companies as HDDs and SDDs are more convenient and brought difficulty in commercializing this technology due to concerns of data leaks and damages, and was further worsened by exposure to UV radiation and H2O2 in current data centers that damages the data within the DNA.
To increase the stability of our DNA-based data storage, our team improved our project design in the following way.
TFAM, also known as Transcription Factor A Mitochondrial, is a DNA-binding protein that is essential for activating transcription in the mitochondria. It functions by adjusting the stability, packaging, and replication of the mitochondrial genome. TFAM has the ability to encapsulate the mitochondrial DNA and secure the information in the DNA from stress factors.
Amino acids are molecules that combine with each other to form proteins. Amino acids serve to build and combine the proteins together. In general, they are essential in the human body as they make proteins to help the body break down food. To be more specific, each part of the amino acid sequence serves a different purpose in serving the function of the entire protein. Different sequences are connected to each other by linkers, and some parts are often discarded during transcription if necessary.
Regarding the amino acid sequences in TFAM, the 43rd to 50th, and the 122nd to 152nd amino acid sequences act as a linker, which in general supports the structure of the protein. The 50th to 122nd amino acids are translated into the HMG BOX-A, and the 152nd to 223rd amino acids are translated into the HMG BOX-B. HMG boxes mediate non-specific or sequence specific bindings and foldings of TFAM, which supports TFAM in serving its functions. The 223rd to 246th amino acids act as a tail, which also serves an important role in protein transcription and the stability of the protein structure. MTS, also known as mitochondrial signal peptide, is a short peptide that directs the transport of a protein to the mitochondria. It can be found in mitochondrial proteins such as TFAM.
At the very beginning of our project, when we first chose to do TFAM, we thought that every amino acid group in TFAM-204 is equally important in the function of a TFAM. So at first, when we planned on doing the TFAM replication as a whole, with 246 amino acid-sequences in total.
However, after discussing with Yoseb Song, a MIT researcher, we learned that the first 42 amino acids wouldn’t have any functions, if not prevent the TFAM from serving its role due to the limits in folding. Doing further research, we realized that the first 42 amino acids sequence serve its role as a MTS, and that they get cleaved once the TFAM gets into the mitochondria. As we are directly mixing TFAM with the DNA vector, which is inside the mitochondria for humans, we didn’t need the MTS to guide the protein to reach the mitochondria.
Thus, we decided to remove the first 42 amino acid sequences in TFAM. This not only made the procedures more convenient, but also made sure that TFAM can fold and function properly, as having extra amino acid sequences might have influenced the TFAM and make it malfunction.
In conclusion, as we have learned that every amino acid group in the TFAM-204 did not have equal importance in TFAM functioning in creating our TFAM-DNA complex, we cleaved the first 42 sequences that served as TFAM guiding acids and used only the rest when testing the effect of TFAM protein through replication. Through this modification of our project, our team anticipates an increase in the utilization of DNA-based data storages, as it directly contributes to the simplification of the utilization of DNA-based data storages in companies. Since the major limitations of DNA-based data storage that the stakeholders commented on was the unstable aspect, our improvement directly targeted the flaw by eliminating extra amino acid sequences that could potentially sabotage the function of the storage. As the limitation is eliminated, DNA-based data storages become much more stable and functional in larger scales, thus increasing the chance of commercialization and utilization for companies or agencies that require handling large amounts of data.
Korea_HS, in order to improve our project, sought out different areas in synthetic biology. We conducted interviews with Seongjun Yoon (the CEO of Fortuga Bio), Jaehan Park (the General Manager for Data Storage at Samsung), Sungjun Lee (a member of a Data Storage Department from a company), Teagsang Yoo (Employee from SK biopharmaceuticals), and Yoseb Song (MIT postdoc.). From the interviews, we have gained valuable insights of Data storage, such as the morality of data storage, its potential, and its possible downside. The most common remark about data storage among the experts was the fact that DNA storage was a double-edged sword: although it is capable of storing immense amounts of data for a considerable duration, it can also have major downsides, such as low efficiency and possible environmental impact caused due to wrongly disposed biohazardous wastes.
After gaining insight into some of the drawbacks, Korea_HS attempted to improve those points. First, we tackled the potential environmental impact of TFAM-DNA data storage by creating the “DNA Disposal Security Doc.” In this paper, we layed out the correct means on how laboratories and facilities can dispose of DNA related products safely and how much disposing safely matters.
Another major limitation of DNA storage brought up in the interview was its low efficiency. The majority of the interviewees pointed out the potential of TFAM not functioning in lab conditions. In response, we attempted to address this problem by making the TFAM protein shorter in length by removing 42 amino acid sequences, making the process noticeably efficient.
In essence, over the course of the interviews, we examined our product thoroughly, discovering important downsides, and we responded to those downsides by bringing direct change to our product. The information was distributed to the public, especially high schoolers by members at school, furthering the communication aspect of our project.
Biohazardous and Medical Waste Overview. (2020, December). UC San Diego Blink. Retrieved October 4, 2022, from https://blink.ucsd.edu/safety/research-lab/hazardous-waste/disposal-guidance/medical/index.html
Brenner, B. (2017, September 17). The effects of biohazard waste on the environment. MedPro Disposal. Retrieved October 4, 2022, from https://www.medprodisposal.com/blog/the-effects-of-biohazard-waste-on-the-environment/
Chapter 9: Biohazardous and medical waste disposal. (n.d.). Boston University Research Support. Retrieved October 4, 2022, from https://www.bu.edu/researchsupport/compliance/ibc/resources/biosafety-manual/chapter-09-biohazardous-and-medical-waste-disposal/
Kang, I., Chu, C. T., & Kaufman, B. A. (2018). The mitochondrial transcription factor TFAM in neurodegeneration: emerging evidence and mechanisms. FEBS letters, 592(5), 793–811. https://doi.org/10.1002/1873-3468.12989
Kunze, M., & Berger, J. (2015). The similarity between n-terminal targeting signals for protein import into different organelles and its evolutionary relevance. Frontiers in Physiology, 6. https://doi.org/10.3389/fphys.2015.00259
Management of BSL 1 recombinant DNA waste. (2011, April). Environment, Health and Safety Information for the Berkeley Campus. Retrieved October 4, 2022, from https://ehs.berkeley.edu/sites/default/files/76recombdna.pdf
Reddy, M. K. (2022, August 22). Amino acid. Britannica. Retrieved October 4, 2022, from https://www.britannica.com/science/amino-acid
University of Washington. (n.d.). Biohazard waste. Environmental Health and Safety. Retrieved October 4, 2022, from https://www.ehs.washington.edu/biological/biohazardous-waste