Model

Introduction

To predict the minimum number of P. destructans cells in a sample that can be detected by our ARROWE system, we built a statistical model explaining the relative efficiency of uptake and integration of environmental DNA (eDNA) into the ADP1 genome. We first measured the transformation frequencies of two antibiotic detector ADP1 strains when different concentrations of their target DNA sequences were present. We then fit a model relating the eDNA concentration to the number of transformed cells using these datasets. Finally, we estimated the detection limit of the detector when placed in an environment with P. destructans DNA. In addition, we undertook a literature review to sketch a mechanistic model of the steps involved in natural transformation of eDNA by ADP1 to understand how this process might be improved.

Why Build a Model?

Modeling is useful to make predictions about a system or environment without investing time, money, and resources into engineering and constructing a system that could fail upon use. Throughout the engineering cycle, models of the system inform further experiments and help in creating a robust product. Our model of the ARROWE system serves to help us understand how our system works—and how laboratory data could set expectations for results in the environment. Modeling also helps us identify potential avenues by which we could engineer ADP1 to improve its transformation frequency.

Defining Terms

ADP1’s natural competence, or ability to pick up foreign DNA sequences from its environment and integrate them into its genome, makes it a useful chassis for genetic engineering and environmental monitoring. We created two antibiotic gene detector strains for testing transformation in a lab setting, one for the nptII gene and one for the TEM-1 gene. These strains each have an inactivated version of the antibiotic resistance gene created through a deletion in the bacterium’s chromosome. ADP1’s uptake and integration of a copy of the antibiotic resistance gene from its environment is confirmed once it repairs this broken gene and makes itself resistant to the antibiotic. Plating a transformation on media containing this antibiotic will indicate the number of colony forming units that were successfully transformed. Plating on nonselective media gives the total number of colony forming units present. The ratio of the two gives the transformation frequency (Tf).

Describing the Model

Our model makes predictions about the rate of change of transformation frequency given changing concentrations of DNA in the environment. ADP1’s ability to integrate foreign DNA fragments into its genome is dependent on many factors, including the size of the fragment, the existing homology between the fragment and ADP1’s own genome, and the metabolic state of the cell. We can use statistics to simplify these factors into one parameter that measures the overall efficiency, giving us insights about our system without the extraordinary difficulty of capturing each individual component.

Our model uses a hyperbolic curve that is inspired by the Michaelis-Menten enzyme kinetics equation:

$$ v = \frac{V_{max}[S]}{K_M + [S]} $$

In this equation, v is the velocity of the reaction, Vmax is the maximum rate of the reaction, [S] is the concentration of the substrate, and Km is the concentration of the substrate when the reaction occurs at half maximum speed (Vmax/2). In our system, the “velocity” of the reaction is successful transformation of a cell from the uptake of environmental DNA and [S] is the extracellular concentration of DNA. For our purposes, the [S] factor in the denominator can be ignored: since we are looking for minimal detection limits, the substrate concentration will never be large enough to affect the overall reaction.

$$ T_f = \frac{T_{fmax}[DNA]}{[DNA] + [DNA_{½}]} $$

We are interested in detecting small quantities of DNA that are far below the saturating concentrations, so we take the limit of this equation where [DNA] << [DNA1/2]:

$$ T_f = \frac{T_{fmax}[DNA]}{[DNA_{½}]} $$

Taking the logarithm of both sides and refactoring the equation, we get

$$ log_{10}(T_f) = log_{10}([DNA]) + log_{10}(\frac{T_{fmax}}{[DNA_{½}]}) $$

This equation has the form of a line: y = mx + b. If this model fits, then performing a linear regression on our log transformed data with log[DNA] as x and log(Tf) as y should give an equation in this form. The slope should equal 1 with the efficiency constant, log(Tf / [DNA]), as the y-intercept.

Our Data

Our model was built on experimental results from trials performed by our team testing the transformation frequency of different gene cassettes providing antibiotic resistance. These cassettes are different sizes; each has different lengths of homology to ADP1’s genome. The results showed differences in the transformation efficiency between the antibiotic cassettes. Transformation frequency data was collected over a wide range of concentrations of the transforming cassette DNA.

Methods and Results

We are interested in the number of molecules of eDNA that could be detected by ARROWE. Since this number is directly proportional to the concentration, we converted the tested concentrations of DNA to the total number of DNA molecules present, then log10-transformed these numbers and their corresponding transformation frequencies. We then removed the highest concentration data value, as this trial was reaching saturation conditions and started to violate the linearity of our model. (It was outside the range in which substrate concentration in the denominator was negligible and could be ignored). We then performed our linear regression using R and found the following equations for each detector.

$$ log_{10}(T_{fnptII}) = log_{10}(Mol_{nptII}) - 12.00 $$ $$ log_{10}(T_{TEM-1}) = log_{10}(Mol_{TEM-1}) - 12.29 $$


Fig. 1. Fit of model to antibiotic detector strain transformation frequencies in assays that varied the concentration of the antibiotic gene cassette DNA added.

To validate our model, we tested whether the slope of the best-fit lines was significantly different from 1 if we fit an alternative linear model that allowed the slope to vary. We did the regression such that it gave a p-value for the likelihood that the null model of a slope of 1 could be rejected. The p-value for the TEM-1 detector was 0.0732 after correcting for multiple testing through Bonferroni correction, and the p-value for the nptII detector was 0.261. Thus, we do not reject the null hypothesis that the slope of the linear regressions is 1. That is, our model for transformation frequency is valid in this range that includes only relatively low DNA concentrations.

Our results give us other useful information. Firstly, they demonstrate that the efficiency of uptake between our two antibiotic detector strains is very similar. This was expected because the DNA cassettes that were transformed have similar size and the detectors have similar homology to their targets. As such, the mechanisms of competence would interact with each detector in a similar manner and produce close efficiency value. Another useful piece of information to be drawn from these results is that we can extrapolate the minimum number of DNA molecules encoding an antibiotic resistance gene that could be detected as one transformant cell. Since a transformation reaction has about 1 billion (109) cells, we predict that we could detect as few as 1,000 copies of the nptII gene or 1,950 copies of the TEM-1 gene.

Finally, we can use this model to estimate the detection limit for the WNS detector strain: the minimum number of P. destructans genomes needed to trigger the transformation of one ADP1 cell. Each P. destructans genome has X copies of the DNA sequence we built into the genome of our WNS detector ADP1 strain. In a system with 1 billion cells of ADP1, our minimum concentration results predict that 2,764 copies of this sequence must be present for detection. This translates into the genomes of 1-3 cells of P. destructans . Therefore, our model suggests that the WNS detector ADP1 might have a similar sensitivity (~250 fg for detection) for detecting P. destructans eDNA as modern PCR-based detection techniques (142 fg for detection), based upon research detailed by Urbina et al . [5]

However, it is important to note that this is a best-case scenario, and there are several reasons that detection is likely to be less efficient for the WNS detector under realistic conditions. One potentially confounding factor is that the design of the WNS detector requires a homologous recombination event. This recombination is expected to be less efficient because it deletes a large insert from the genome versus repairing a small change to a gene as is the case for the antibiotic detectors. Other non-ideal conditions that would be expected to lower transformation efficiency would include factors like unstable or degrading eDNA; off-target DNA sequences competing with uptake of the DNA matching the detector; soil or other material in a sample interfering with DNA uptake. These factors can be tested in future experiments to refine our prediction of the sensitivity of ARROWE.

Mechanics

In our detector system, ADP1 is successfully transformed—in other words, successful detection of environmental DNA (eDNA) occurs—when ADP1 cells uptake the correct DNA fragment from the environment and integrate the fragment into their genome via homologous recombination. To explore how ADP1 might be engineered for improved transformation frequency, we conducted a literature exploration of DNA uptake and recombination, resulting in a qualitative model of the transformation process.

Fig. 2. Graphical representation of ADP1’s uptake protein machinery.

From eDNA’s existence in the environment to its integration into the ADP1 genome, there are 5 main steps:

1. eDNA uptake.

On the surface of ADP1 cells are “pseudopili”: appendages resembling the Type IV pilus (T4P) used for DNA uptake in other bacterial species. During uptake, an eDNA fragment first binds ComC, a protein at the peripheral tip of the pseudopilus. An ATPase catalyzes the retraction of the pseudopilus, pulling the eDNA through ComQ, a channel protein in the pseudopilus, through the outer membrane into the periplasmic space. The periplasmic protein ComEA binds the eDNA to prevent its backward diffusion through the outer membrane, and ComEA supports the DNA’s movement through the periplasm towards the inner membrane. [1]

2. DNA translocation into cytoplasm.

From the periplasm, DNA enters the cytoplasm through the inner membrane channel ComA. This DNA translocation through ComA is coupled with processing by an unidentified DNAse. After an initial cut is made to the double-stranded DNA, one strand gets degraded, while the other enters the cytoplasm from the 3’ to 5’ direction. The net rate of DNA uptake is estimated to be 60 nucleotides per second. [1],[4]

3. DNA bound by DprA, then RecA.

After being internalized, single-stranded (ss) DNA may be degraded by exonucleases, with an estimated 500 nucleotides degraded per internalized fragment. The DNA-processing protein DprA binds internalized ssDNA to protect it from further degradation. DprA also recruits and loads the recombination protein RecA onto the ssRNA. [2]

4. Homology search conducted by RecA.

RecA constructs a transformation heteroduplex, identifying homologous sequences between the ssDNA and genomic DNA via one-dimensional sliding. The mRNA and biological processes involved in the construction of this recombination complex are similar to or the same as those involved in other DNA repair processes. [2],[3]

5. Homologous recombination of DNA into genome.

Upon discovering homology between the ssDNA and genomic DNA, RecA recruits RecT and related recombination proteins. These proteins build a ternary heteroduplex, which integrates the ssDNA into the genome. One end of the ssDNA strand joins covalently to the genome, while the other end the strand could remain unrecombined until other homologous recombination repair proteins help ligate it to the genome. Once the single strand is recombined, other DNA recombination proteins use it as a template for replicating the second strand, after which the DNA is successfully integrated into the genome. [1],[2]

We can pinpoint several areas at which transformation frequency might be improvable. For example, overexpression of some competence proteins may increase environmental DNA uptake, or overexpression of DprA may reduce the degradation of internalized DNA. The understanding of ADP1's competence and homologous recombination mechanisms gained from this qualitative model can guide future engineering efforts for the optimization of ADP1 as an easily transformable chassis.

References

[1] Averhoff, Beate, et al. “Natural Transformation in Gram-Negative Bacteria Thriving in Extreme Environments: From Genes and Genomes to Proteins, Structures and Regulation.” Extremophiles, vol. 25, no. 5-6, 20 Sept. 2021, pp. 425–436, 10.1007/s00792-021-01242-z.

[2] de Vries, Johann, and Wilfried Wackernagel. “Integration of Foreign DNA during Natural Transformation of Acinetobacter Sp. By Homology-Facilitated Illegitimate Recombination.” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 4, 19 Feb. 2002, pp. 2094–2099, www.ncbi.nlm.nih.gov/pmc/articles/PMC122324/, 10.1073/pnas.042263399.

[3] Hepp, Christof, and Berenike Maier. “Kinetics of DNA Uptake during Transformation Provide Evidence for a Translocation Ratchet Mechanism.” Proceedings of the National Academy of Sciences, vol. 113, no. 44, 17 Oct. 2016, pp. 12467–12472, 10.1073/pnas.1608110113.

[4] Palmen, Ronald, and Klaas J Hellingwerf. “Uptake and Processing of DNA by Acinetobacter Calcoaceticus – Areview.” Gene, vol. 192, no. 1, June 1997, pp. 179–190, 10.1016/s0378-1119(97)00042-5.

[5] Shuey, Megan M., et al. “Highly Sensitive Quantitative PCR for the Detection and Differentiation of Pseudogymnoascus Destructans and Other Pseudogymnoascus Species.” Applied and Environmental Microbiology, vol. 80, no. 5, Mar. 2014, pp. 1726–1731, 10.1128/aem.02897-13.

Code

The following script and files were used to run our analysis:

nptII data

TEM-1 data

R script: copy from the .pdf to a .Rmd file for best results

  
```{r libraries}
library(tidyverse)
library(car)
library(emmeans)
```


```{r loading-data}
#pulled from our experimental data
nptii <- read.csv('nptiidata.csv')
#nptii <- na.omit(nptii)
tem1 <- read.csv('tem1data.csv')
#tem1 <- na.omit(tem1)
#loading data in tidy formatting
```

```{r log-scaling}
#creating empty vectors and logscaling each column of the experimental data

nptii$MolLog <- as.numeric(log10(nptii$Mol))
nptii$TfLog <- as.numeric(log10(nptii$Tf))
#log scaling each column of original data and append these columns to the df
nptii <- as.data.frame(nptii)
nptii <- na.omit(nptii)
#cleaning the df and ensuring it is the right type

tem1$MolLog <- as.numeric(log10(tem1$Mol))
tem1$TfLog <- as.numeric(log10(tem1$Tf))
#log scaling each column of original data and append these columns to the df
tem1 <- as.data.frame(tem1)
tem1 <- na.omit(tem1)
#cleaning the df and ensuring it is the right type

```

```{r removing-outliers}
#making a new dataframe of each detector without the "saturation" concentration values from each trial
nptiidel <- nptii[!(row.names(nptii) %in% c("1","6","11")),]
nptiidel$Fit <- nptiidel$TfLog - nptiidel$MolLog
tem1del <- tem1[!(row.names(tem1) %in% c("1","6","11")),]
tem1del$Fit <- tem1del$TfLog - tem1del$MolLog
#we'll fit the model to these "del" dataframes - the saturation values fall outside the scope of our model and aren't included

```

```{r fit-and-summary}

#the model, even without setting a slope, should have a slope close to 1, as the logarithmic version of the Michaelis Menten eq. is log(Transformation Frequency) = log([DNA]) + ([TfMax]/[DNA 1/2]), where the last expression is analogous to Vmax/K0

#making linear models of the flat and unflattened data

lmflattem1 <- lm(Fit ~ MolLog, data = tem1del)
#summary(lmflattem1)

#setting slope to 1 for the lm: regression will only match the intercept
m <- 1
lmsettem1 <- lm(TfLog - m*MolLog ~ 1, data = tem1del)
#summary(lmsettem1)


lmflatnptii <- lm(Fit ~ MolLog, data = nptiidel)
#summary(lmflatnptii)
m <- 1
lmsetnptii <- lm(TfLog - m*MolLog ~ 1, data = nptiidel)

summary(lmsetnptii)
summary(lmsettem1)

```
$$

  log(T_{f TEM-1})=log(Mol_{TEM-1}) - 12.29

$$

$$
  log(T_{f nptII})=log(Mol_{nptII}) - 12.00
$$

#showing each detector and efficiency value in the equation using TeX 

```{r}
nptiisat <- nptii[c(1, 6, 11), ]
nptiidel <- nptii[-c(1, 6, 11), ]
tem1sat <- tem1[c(1, 6, 11), ]
tem1del <- tem1[-c(1, 6, 11), ]
#creating new dataframes for each detector: one with all unsaturation values, one with only saturation values, to make graphing more convenient
```


```{r charting}
#displaying the data

colors <- c("nptII" = "red", "TEM-1" = "blue3", "nptII saturation" = "coral", "TEM-1 saturation" = "cornflowerblue")
#editing ggplot argument to create a proper legend for the data
ggplot() +
  geom_point(data = tem1del, aes(x=MolLog, y=TfLog, color = "TEM-1")) +
  geom_point(data = tem1sat, aes(x=MolLog, y=TfLog, color = "TEM-1 saturation")) +
  geom_abline(data = tem1, intercept = -12.29, slope = 1, group=1, color = "darkblue") +
  geom_point(data = nptiidel, aes(x=MolLog, y=TfLog, color = "nptII")) +
  geom_point(data = nptiisat, aes(x=MolLog, y=TfLog, color = "nptII saturation")) +
  geom_abline(data = tem1, intercept = -12, slope = 1, group = 1, color = "darkred") + 
  labs(x = "Log10 of Number of DNA Molecules", y = "Log10 of Transformation Frequency", color = "Antibiotic")+
  scale_color_manual(values = colors) + 
  theme_bw()
#graphing the points and regression lines for each detector


```