DESCRIPTION

Limitations of Current Data Storage Model


The volume of data created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025.)

Newly produced digital data have been accumulated relentlessly, easily exceeding the pace of processing and storing the data. Although many portions of the data are considered worthless, some valuable information still needs to be stored as long-term data. Since corporations and organizations must save and manage data, data storage centers emerged to keep the massive amount of information physically.

The basic storage method, nearly used in computing systems nowadays, is the binary system. It describes a numbering scheme, in which each digit in a code is replaced with either a 0 or 1. Data storage centers that use this system serve three primary purposes: computing, storing, and networking. They process the data, store them in hard drives, and share the data with other centers. By storing the data physically, data storage centers aim to keep the information persistent and accessible so that corporations or organizations can recall data or store additional information if necessary. Also, corporations can continuously update the data with the help of facilities in data storage centers.

Despite the contemporary spiking trend regarding the use of data storage centers, the current model has several flaws to consider for our industry's foreseeable future. Primarily, data storage centers are not robust, meaning they cannot store everyone's information forever. Existing storage centers require technical equipment to secure information and extensive facilities to keep the hardware and software running, some of which include power subsystems, ventilation and cooling systems, backup generators, uninterruptible power supplies (UPS), and cables to connect external network operations. Moreover, to protect the data center against most external attacks, such as firewalls, intrusions, and denial-of-service attacks (DDoS), it is essential to have the right tools and services before when one might approach an attack. However, especially for physical machinery, there are limits to how long they can remain in usable, unwaning condition, indicating that the model we are using presently is less cost-effective to use in the long term.

Second, the density of stored information in current data storage centers is considerably low. In other words, the amount of data stored in data storage centers are relatively small compared to the space they take up. In comparison to an alternative method, data storage in DNA, the present system lacks in this sense. DNA can store up to 1018 bytes per mm3, which is about six orders of magnitude denser than the densest storage method available today, making it a more advisable system.

Lastly, the modern data storage model requires a specialized reader to access stored data. To examine and edit the information, specialized readers should be made and allotted for each media composed of data. However, due to the size and the amount of the data, producing specialized readers is not economical. Considering that the pursuit of profit is the main goal of corporations, using this lock and key mechanism is not efficient enough.

Long-Term Implications for the Current Data Storage Model

Hard drives require large amounts of electricity and produce an unseizable amount of greenhouse gas. They have served people some convenience. However, greenhouse gases being produced and the electricity being used in traditional storing methods have been a contributor to climate change. “Since the mid-1900s, climate change is getting more serious every year at an exponential rate,” and “ according to the 2022 IPCC report, if we do not take a turn now, we will never be able to go back, and climate change is only going to get more serious” (Jihyeon (Cherry) Sung, a climate activist). Although people know that using hard drives is not the best way to store data for the environment, they still decided to continue with this way of data storage due to the limitations of other well-known storage methods. Hence, we were motivated to aim for the fabrication of a newer, better storage medium that can secure the data, minimizing the harm to the environment and maximizing the safety and quality of data.


Data of CO2 emissions produced by data centers, PCs and peripherals, networks and devices, and selected countries.

Collected data of amounts of data center-related usage of electricity in Nigeria, Colombia, Argentina, Egypt ,South Africa, Indonesia, and UK.

Within many pre-existing methods to solve the limitation of the current DNA storage method, we’ve chosen to figure out a way to preserve DNA more efficiently by making TFAM protein using synthetic biology.

In attempts to store data in DNA in a more cost-efficient and convenient way, freezing and thawing storage methods, also known as working storage methods, are being developed in the status quo. Freezing-thawing storage methods store the DNA with data that doesn’t require frequent access to the information in solid forms through freezing. When the data needs to be accessed, the frozen DNA must be thawed back at a specific temperature to decode the data using nucleotide sequences. Despite various limitations of the current working storage method outweighing the benefits, because of the lack of alternatives in the status quo, it has been known to be the most used technique in storing DNA.

The main reason why people consider freezing and thawing to be a solution for DNA data storage is that it allows people to store data in DNA in a more stable and accessible manner compared to traditional DNA data storage methods. In addition, it was an alternative to storing DNA through archive storage methods, making it hard to retrieve the information. It was also the best alternative to the currently existing technologies for short-term storage, storing DNA in a solution state without protective mechanisms.

Despite the benefits of freezing and thawing, it still has greater limitations, such as its inefficiency and possible factors to damage the data in DNA. Therefore, we developed the TFAM-DNA storage method, which is better than the freeze/thaw method in many aspects.

The first reason we chose to apply synthetic biology, besides freezing and thawing, is because it takes fewer procedures and energy. To prevent the degradation of DNA with the freezing and thawing method, DNA should be stored as a precipitate in ethanol at a very low temperature. Also, the ethanol must be removed before use, which takes tremendous time while storing and retrieving DNA data. However, by using TFAM protein, which we re-designed, DNA can be stored even in the aqueous solution, making data storage more convenient and saving considerable energy.

The second reason for choosing to use synthetic biology instead of the freezing and thawing method is its safety. Even though freezing and thawing is known as a safe method to store DNA, some factors may harm the data in the DNA. Nucleic acids are most stable at approximately pH 7.5 to 8. However, as freezing starts, some small regions of pure water will freeze first. And this may cause a change in pH, which could impact nucleic acid strands passing through the liquid zones. To prevent DNA from being damaged, we developed the TFAM-DNA storage method, in which DNA can be stored in an aqueous solution with no concern in the change in pH.

In short, the TFAM-DNA storage method, which was developed through synthetic biology, will be the most promising method to store DNA, overtaking the freezing and thawing method for its benefits in many aspects.

Our Solution: Improving the DNA Data Storage Model

Climate change is aggravating due to the method of data storage that many people are using. By making DNA-based data storage more stable and effective, we further aim to contribute to solving the associated problems, such as climate change. Our project, storing data in DNA by forming a TFAM-DNA complex, can yield various benefits. First of all, TFAM contributes to the DNA's stability, allowing better reliability of DNA-based data storage methods. TFAM is a protein in the cell that forms a complex to encapsulate the mitochondrial DNA. If TFAM can serve the same function as the DNA containing data, the DNA would be better protected by various stress factors. Moreover, DNA-based data storage methods have high information density. DNA data storage can also effectively store long-term data such as historical records. DNA-based data storage methods are also known to be eco-friendlier than the current data storage models.

However, there are several limitations of the DNA data storage method. One of the limitations is that the DNA data storage needs to be located in a cold environment. Still, this would be a better alternative than the current storage methods, such as hard drives and SSDs, as they require immense energy and produce copious amounts of greenhouse gasses when storing data, making climate change get worse.

One of the main limitations of the current DNA storage method is the lack of stability in DNA when exposed to stress factors such as UV light or H2O2. To tackle this issue, we have developed a TFAM-DNA complex and tested the stability of the DNA by exposing the TFAM-DNA complex to UV light and H2O2. The stability was examined by recalling the data from the DNA exposed to the stress factors chosen and comparing the data to the original image that we used.


An image of TFAM-204.

We made the TFAM protein mainly via the following procedures: amplifying TFAM protein coding sequence through PCR test, digesting and ligating TFAM DNA into pET28 vector using BamH1 and XhoI restriction enzymes, harvesting BL21(DE3) E. coli to collect protein, and purifying the overexpressed protein.

Step 1. Amplification of TFAM gene from human cDNA via PCR

First, among the four types of TFAM mRNA, we chose to use TFAM-204 mRNA coding sequence, which is the major mRNA that produces the longest amino acid sequence, 246 aa, and is most widely used. However, among 246 amino acid sequences, the first to 43rd amino acid sequence is translated into the Mitochondrial Signal Peptide (MTS) that helps TFAM protein travel from the cytosol into mitochondria in the cell. Therefore, in our experiment, it is cleaved because TFAM will be attached directly to the data-stored DNA. Also, The TFAM cDNAs from different cell origins, A172 (brain), MCF7 (breast), MKN45 (stomach), and A549 (lung), were tested via PCR and put into an agarose gel to compare which cell has the gene that produces the most TFAM protein. Finally, after selecting the best TFAM cDNA, we multiplied it through a PCR test to insert them into the E. coli vectors later. Then, we conducted agarose gel electrophoresis to check if the TFAM cDNAs were duplicated correctly. The PCR solution was then purified to eliminate primer, DNA polymerase, and other residues left over after PCR by adding buffers to the filter and centrifuging it.


The process of PCR that replicates TFAM DNA.

Comparing TFAM cDNAs from different cell origins.
Step 2: Digestion/Ligation of TFAM DNA into pET28 Vector

DNA Ligation

After getting the purified TFAM-204 mRNA CDSs, we digested BamH1 and Xho1 restriction enzymes into the purified TFAM DNA and pET28 vector of E. coli. Therefore, TFAM DNA could be inserted into the pET28 vector of E. coli and produce TFAM proteins in the E. coli. Here, we particularly used the pET28 vector because this vector has a His-TAG-producing sequence in the N-terminus, which we will use to purify the protein later in the experiment. Then, the TFAM DNA-including vector was transformed into the E. coli cell. Here, the lacI gene and DE3, including the T7 gene and lac promoter, which we synthesized into the E. coli genome using engineering, took the role. When we put the IPTG in the E. coli, the lac repressor falls away from the lac promoter and activates the T7 gene that produces T7 RNA polymerase. When the T7 polymerase arrives in our vector, it activates the target gene in the vector, TFAM DNA, making the E. coli produce the TFAM protein.

Step 3: Harvesting BL21(DE3) E. coli for Protein Collection

After growing the E. coli, TFAM proteins were collected from the harvested E-coli sample solution. We harvested the E. coli in four different colonies to check which colony produced the protein the most by conducting an SDS-PAGE test on each sample and selecting the colony with the most protein. We also tested whether IPTG serves its purpose in removing the lac repressor and helps produce the proteins by comparing the proteins of two samples, one with IPTG and one without IPTG.


The SDS-PAGE gel test performed to identify the E.coli sample that produces the most TFAM protein and to test the effectiveness of IPTG induction.
Step 4: Purification of TFAM Protein

After selecting the best TFAM protein, we purified it using Ni-NTA magnetic nanobeads. When we designed the TFAM protein sequence and the vector, a nickel column, HIS-TAG, was attached to the N-terminus of the pET28 vector. Therefore, using the property that a nickel column, His-TAG, can be attracted to the magnet, we purified the protein via the following procedures: equilibrating Ni-NTA magnetic nanobeads, binding the protein to the magnet, washing magnetic nanobeads, and eluting target proteins. Using the SDS-PAGE gel, we analyzed whether the purification process was effective after the purification.

Step 5: Protein Concentration Quantification

Finally, Bradford Assay was used to quantify the amount of protein (TFAM) in the solution. Coomassie Brillian Blue G-250 (CBBG), an absorbing medium, is used because it reacts with the protein TFAM and changes the protein color from maroon to blue. Then, using microspectroscopy, the absorbance of Coomassie Brillian Blue G-250 (CBBG) to TFAM was measured, compared to the Bovine Serum Albumin (BSA) standard curve, to check how many proteins were produced.


Bovine Serum Albumin (BSA) Standard Curve
Step 6: TFAM-DNA Complex Stress Test

After checking the amount of protein produced, we put the TFAM protein into the DNA solution and combined them with the best TFAM:pSmile binding mol ratio, 113.47:1 so that the protein can protect the DNA the most effectively. We then conducted a stress test by applying UV and H₂O₂ stress, which can easily damage the DNA, on both naked DNA and TFAM-DNA complex to check the effectiveness of the TFAM protein protecting the DNA. Finally, the DNA sequence was analyzed by Sanger sequencing. The analyzed sequence was converted into binary code and decoded back to the original image to check the integrity of DNA in each sample and the effectiveness of the TFAM protein.


Applying UV stress to the TFAM-DNA complex samples

2022 iGEM: COVID-19 and Our Project

COVID-19 has been such a prevalent issue globally, impacting not only our iGEM team but also other iGEM teams in different countries. Since last year, iGEM meetings have typically taken place in Zoom meetings; now, we are slowly returning to in-person or at least attempting to include more in-person meetings. This is amazing news for our iGEM team because there are great benefits in-person meetings could produce instead of online sessions. Some of the significant issues the pandemic caused were constant online meetings via zoom and the limit of the number of people for lab experiments. Indeed, online meetings were extremely convenient and comfortable for all; however, it would have been better if our iGEM team had more in-person meetings.


Korea HS’s First In-Person Meeting

The difference between meeting online and offline can be significant, revealing one’s personality, work ethic, and behavior - oneself. For most meetings, such as collaboration, human practices, SDG, and entrepreneurship, online meetings were an ideal way to schedule meetings every week. On the other hand, in large meetings where all our members participate, an in-person meeting is preferable; it may be difficult to focus throughout the whole two hours when the meeting is taking place online. Overall, we believe we got the opportunity to at least have in-person meetings as we were not able to have any last year.

Furthermore, wet lab or lab experiments have been an issue. To be frank, we did complete the wet lab part of the experiment within two months; however, due to the pandemic, there were some delays along with a limit to how many people could enter the lab. It would have been optimal to conduct the lab experiment before collaborating with other teams due to the number of times the other teams asked for our web lab. Fortunately, after we completed our wet lab, we shared our results with teammates. Besides the delay, there were also problems with the number of people a lab could contain due to COVID-19 protocols. Thus, COVID-19 had a large impact mostly on the meeting schedule and lab experiments, both the limitation of people and delays.

Sources:

Bits and binary - Introducing binary - GCSE Computer Science Revision. (n.d.). BBC Bitesize. Retrieved October 2, 2022, from https://www.bbc.co.uk/bitesize/guides/zwsbwmn/revision/1 What Is a Data Center? (2022, August 3). Cisco. Retrieved October 2, 2022, from https://www.cisco.com/c/en/us/solutions/data-center-virtualization/what-is-a-data-center.html

NCBI - WWW Error Blocked Diagnostic. (n.d.). Retrieved October 3, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8191772/

Yasar, K., Loshin, P., & Lutkevich, B. (2022, April 25). data center. SearchDataCenter. Retrieved October 2, 2022, from https://www.techtarget.com/searchdatacenter/definition/data-center

Ionkov, L. B. S. (2021, May 28). DNA: The Ultimate Data-Storage Solution. Scientific American. Retrieved October 3, 2022, from https://www.scientificamerican.com/article/dna-the-ultimate-data-storage-solution/

Hidden Environmental Impacts of New Technologies. (2018, July 26). New York League of Conservation Voters. Retrieved October 2, 2022, from https://nylcv.org/news/hidden-environmental-impacts-new-technologies/#:%7E:text=The%20manufacturing%20of%20disks%2C%20external,where%20they%20do%20not%20biodegrade.

Matange, K., Tuck, J. M., & Keung, A. J. (2021). DNA stability: a central design consideration for DNA data storage systems. Nature communications, 12(1), 1-9.