MODEL

Background Information and Purpose of Our Model

DNA-based data storage is a rapidly evolving technology that could solve the rising energy and space needs of current information storage methods such as hard disk drives and solid state disks. With the limitations of such storage methods, DNA-based data storage has been recognized for being more cost and space efficient. However, the stability of DNA is limiting its potential for future data storage systems. Therefore, to address the limitations of the current model, we developed a novel method to store DNA in an aqueous solution with enhanced stability. Forming a Transcription Factor A Mitochondria (TFAM)-DNA complex in an aqueous solution, where TFAM encapsulates the DNA to protect it from UV irradiation and oxidation stress, could enhance the stability of DNA during storage and retrieval. TFAM-DNA complex was formed with replicated human lung cDNA and purified TFAM protein.

TFAM Protein Design


[Figure 1] Representation of final construct of TFAM domains used in this experiment.

43-246aa TFAM was cloned into pET28 (N-terminal Histidine tag, E.coli expression vector) to produce His-tagged TFAM protein. The first 42 amino acids of TFAM guides TFAM into the mitochondria. However, as TFAM will be directly attached to the data-stored DNA in our experiment, the first 42 sequences were cleaved. “L” represents the linker domain. The linker connects the two HMG-box domains. “HMG BOX” represents the high mobility group domain. This tandem HMG (high mobility group)-box domains are what allows TFAM to play a crucial role in the expression, maintenance, and organization of the mitochondrial genome [1].

The mature TFAM protein contains two HMG boxes separated by a linker and a charged C-terminal tail. HMG-box domains allow TFAM to bind, wrap, and bend DNA without any sequence specificity. C-terminal tail is required for activation of promoter-specific mtDNA transcription. TFAM gene (43aa-246aa) was cloned into pET28 vector to overexpress TFAM protein in E.coli (Figure 1). This vector allows the TFAM protein to be fused with 6 x Histidine tag at the N-terminal region.

DNA-TFAM modeling


[Figure 2] Mitochondrial DNA storage model: Transcription Factor A Mitochondria (TFAM)-DNA complex.

TFAM is abundant enough to cover the whole region of mtDNA and to play a histone-like role in mitochondria. Therefore, TFAM encapsulation may protect the DNA from external stress such as H2O2 and UV (Figure 2).

While TFAM protein distorts mitochondrial DNA into a U-turn, in this complex, the DNA was not distorted into a U-turn [2]. The mitochondrial genome contains three promoters - the light strand promoter (LSP), the heavy strand promoter 1 (HSP1), and the heavy strand promoter 2 (HSP2) [1]. When TFAM is bound to these promoters, either LSP or HSP, the U-turn is formed. However, in our project, TFAM and DNA are not bound at a specific promoter but they combine non-specifically, which therefore leads to a U-turn not being formed. Instead, TFAM and DNA are compactly binded without a U-turn.

DNA Data Storage Modeling


[Figure 3] Converting binary image (smiley face) to DNA sequence.

0 = white, 1 = black in binary code. G or C = 0, A or T = 1 in DNA sequence. A binary image (smiley face) is a digital image composed of 2 colors (black and white). Therefore, it is possible to represent a binary image as a binary code when white pixels are considered as 0, and black colored pixels are considered as 1. In this research, the smiley face image was converted into binary code using this representation. Then this binary code was again converted into a DNA sequence, where Guanine (G) and Cytosine (C) represents binary code 0 and Adenine (A) or Thymine (T) represents binary code 1. Based on this nucleotide sequence, a DNA strand was synthesized and cloned into a pBHA vector (pBHA/smile) (Figure 3).


[Figure 4] The converted smiley face image to DNA sequence cloned into pBHA plasmid vector (pSmile DNA).

Sanger sequencing is needed to check whether the DNA sequence is protected and can retrieve the DNA sequence after exposing it to various stresses. Therefore, the smiley DNA sequence was cloned to the pBHA plasmid (1987 bp) to produce the pSmile DNA plasmid (2087 bp) (Figure 4).

TFAM-pSmile Binding Ratio Modeling


[Figure 5] Electrophoretic mobility shift assay (EMSA) results identify the optimal mol ratio of TFAM to DNA complex.

pSmile DNA bands were shifted to an upper position indicating that the TFAM-pSmile complex was successfully formed. The varying mole concentration of TFAM and pSmile used in each sample are indicated in the table above the agarose gel image. Also, the mole ratio of TFAM to DNA is calculated and indicated in the table above the agarose gel image.

Equation (1) represents the In the simple equation below mol concentration of [TFAM] and [DNA] is in equilibrium with product concentration of [TFAM DNA complex] .

A previous study indicates that TFAM:mtDNA (16569 bp) mol ratio is ~900:1 in human placental mitochondria [3]. Assuming that purified TFAM binding affinity to pSmile DNA is similar to endogenous TFAM to mtDNA in human mitochondria, the binding mol ratio of TFAM to DNA must be normalized based on the DNA size. Therefore, the expected mol ratio of pSmile can be calculated using equation (2) as the following.

Using equation (2), the calculated mol ratio (TFAM:pSmile) is 113.47:1. Interestingly, EMSA results showed the maximum band shift between the mol ratios of 100.79 and 115.19, indicating that the maximum binding capacity of purified TFAM protein to pSmile DNA is between these mol ratios (Figure 5).


[Figure 6] Retrieving the smiley face DNA sequence using Sanger sequencing.

The undamaged pSmile DNA and damaged pSmile DNA may show different sequencing reads when undergone Sanger sequencing (Figure 6). Since H2O2 and UV stress may break the double-strand DNA, damaged DNA may show limited DNA sequence information when compared to undamaged pSmile DNA. This is because DNA is known to deform in conditions exposed to H2O2 or UV radiation.

3D Modeling

Methodology

ResearchGate is a novel software system that allows the prediction of the 3D model of proteins after folding. The structure profile can be interpreted using the following color key: blue - alpha helix structure and orange - loop structures. [4]

Results

Shown below is the protein folding predictions of TFAM.


[Figure 7]

References

[1] B. Ngo, Huu et al. “Tfam, a mitochondrial transcription and packaging factor, imposes a U-turn on mitochondrial DNA.” Nature Structural & Molecular Biology vol. 18 (2011): 1290-1296. doi: 10.1038/nsmb.2159

[2] Rubio-Cosials, anna et al. “Protein Flexibility and Synergy of HMG Domains Underlie U-Turn Bending of DNA by TFAM in Solution.” Biophysical Journal vol. 114,10 (2018): 2386-2396. doi: 10.1016/j.bpj.2017.11.3743

[3] Alam, Tanfis Istiaq et al. “Human mitochondrial DNA is packaged with TFAM.” Nucleic acids research vol. 31,6 (2003): 1640-5. doi:10.1093/nar/gkg251