Measurement
Overview

The measurement tools that our team worked on this year were mainly aimed at processing large amounts of data and analyzing the reproducibility of experiments. In recent decades, there has been a rising concern that many published scientific results fail the test of reproducibility, especially in early 2022. We write algorithms that can illustrate the degree of reproducibility through data based on previous power analysis and other statistical principles. In addition, since higher reproducibility requires more parallel groups and a clearly documented data analysis process, we provided some analysis tools for teams interested in increasing the amount of data but lacking the means to do high-throughput analysis to ensure that more teams can focus on the reproducibility of their experiments. High-throughput analysis tools include ANOVA-based multi-group comparisons, t-analysis-based mean analysis, image processing of fluorescence intensity, etc.

BackGround

Early in the experiment, our wetlab group shared data from a pre-experiment to determine the growth curve of *E. coli* at a group meeting. Our results from that pre-experiment were unsatisfactory and showed several uncharacteristic fluctuations, which is a far cry from what is in the references. Inspired by this, we realized that it was necessary to produce a set of criteria for our subsequent experiments that could be analyzed for reliability, not only to provide a reliable basis for mathematical analysis to draw our conclusions, but also to measure whether the conclusions we made were of sufficient scientific validity. And since the reproducibility of research was the hot topic at that time, our team planned to develop a more scientific means of measurement based on the pursuit of experimental reproducibility.

Standardization

Most of the conclusion-drawing experiment can be seperated to two types of experiment. One draws conclusions from description, the other from comparison. By definition of reproducibility, both of their results should be achieved again with a high degree of reliability when the study is replicated. Thus, any results should be documented by making all data and code to calculate the results available [1] in such a way that the computations can be executed again with identical results. In addition, to measure the informative value and reproducibility of our findings for subsequent teams, we can calculate the possiblility of type I and type II error as references. And to ensure lower possibility of type I and type II error, high-throughput analysis tools are effective, which will also be part of out measurement tool.

The criterions for experiments for drawing conclusions by description are as follows:

  1. Automate as much as practicable, avoiding manual intervention where feasible.[2]
  2. Rigorous document of the procedure during the experiment.[3]
  3. Perform high throughput experiments and analysis in time whenever possible.

The criterions for experiments for drawing conclusions by comparison are as follows:

  1. Predict the required sample size by power analysis before the experiment.
  2. Rigorous document of the procedure during the experiment.
  3. Do posteriori analysis of data in time.
  4. Verify of the validity of the hypothesis for which the analysis was performed.
  5. Do power analysis of the reliability of the conclusions drawn after the results are obtained.

Such criterions are to meet the standardization of measurements as much as possible and high-throughput experiment and analysis reduce the possibility that the conclusions are obtained by type I and type II errors before experiment.

To predict and analyze experiments data more effectively, we made a standardized data analyzing table called 'standardized excel'. Sheet1 is for predicting the efficent sample size and Sheet2 is for analyzing the comparison experiment data.

In sheet1, 5 parameters are required. The first is the name of your project, which does not affect the calculation. The second is group numbers, which determines the prediction method. The third is predicted significance level between samples, which has four options from 'low' to 'very high' representing your predicted mean difference between the two from insignificant to significant. The fourth and fifth are predicted chance of you mistake rejection of your actually true hypothesis and you reject your hypothesis that is actually false. If you want to get more reliable conclusions, both choose 'strict' to predict a high power experiment; if you wish to use as smaller sample size as possible while still achieving a certain level of reliability, choose 'loose' for both; if you know what does these two parameters mean, you can choose base on your experiments. Fill in Sheet1 and run the algorithm already written, which will perform the first of the criteria for prediction.


Sheet.1prediction sheet

In Sheet2, except of the parameter of name and predicted chance that you mistake rejection of your actually true hypothesis, you just need to enter the experimental data into the table in groups. The algorithm will match the appropriate analysis to the input data type. Fill in A and run the algorithm that has been written to analyze the third, fourth, and fifth of the execution criteria.


Sheet.2 analysis sheet

The analysis tools are as follows:

We will first assume that all the independent variable have the same influence on implicit variable.

If the input group number (`k`) `=2`, we will analysis it by Student't test (t-test); if `k\geq2`, we will use ANOVA to analysis. What's more, if there do exist difference in `k\geq3` comparison, it support manually go on for Tukey HSD to find which different from others. It also support manually adjusted to ANCOVA. These manual method selections and analysis can also be automatic, but for some reason we haven't finished it.

Formula and calculation of ANOVA and Student's t test:

`F = MS_B/MS_W`

`\text{where }MS_B={\sum_{i=1}^k{n_i(\bar{x_i}-\bar{x})}}/{(k-1)} `

`MS_W={\sum_{i=1}^k\sum_j^{n_i}{(x_{ij}-\bar{x_i})}}/{(N-k)}`

`\text{when } k=2, t=\sqrt{F}`

From F or t, we know the p-value of ANOVA or t-test. If p < 0.05, there was the independent variable have the different influence on implicit variable. In addition, we also provide visible analysis for ANCOVA and ANOVA.

When it comes to verification of the validity of the hypothesis, qqPlot is for the assumption of normality and Bartlett test is for homoscedasticity assumption. Besides such two common test, we have attached some of the special hypothesis tests used in our analysis. It is recommended that subsequent teams do a good job of hypothesis testing after using the same type of analysis tool. If possible, please keep improving this tool.

After focusing on the conclusion analysis, we used efficacy analysis for the reliability of this conclusion.

Formula and calculation of parameters in power analysis

`ES_{experiment}=\sqrt{\frac{\sum_{i=1}^k(\bar{x_i}-\bar{x})^2}{k\cdot MS_W}}`

With fixed parameter `ES_{experiment}`, `k`, `n`, by adjusting `\alpha`, we can get the corresponding power.

The next section will give examples of our analyzing experiment data using our analysis tools and describe our criteria in detail.

Examples
1. Identification of viability of immobilized bacteria (discribing experiment)

In this experiment, we smear immobilized E. coli* after lysis, use the brightness of its expressed fluorescent protein as a measure of survival, and measure the fluorescence intensity by taking pictures under a fluorescent microscope.

In this experiment, we focus on whether the ability of individual bacteria to express proteins is affected during immobilization culture. The fluorescence intensity of the green fluorescent protein is used here as a measure, and when the average fluorescence intensity of an individual bacterium remains stable during the culture, it is an indication that it can maintain its production in the immobilized culture.

As is asked by reproductivity, discribing experiment should automate as much as practicable, avoiding manual intervention where feasible. To measure this metric, we used watershed algorithm to process the grayscale image and identify individual bacteria. Then we calculate the fluorescence intensity of individual bacteria by taking the average of the brightness of the pixel points inside one watershed. Finally, by averaging all the relative fluorescence intensity in grayscale daily, we get a trend of their viability.


The leftmost image is the original, the right side is the cell boundary determined using the watershed algorithm (marked by blue), and the individual fluorescence intensities are averaged from the pixel points within the blue boundary

Comparing with measuring OD value or sampling fluorescence, image processing tool is individual-foucing and high-throughput analysis. Such a tool increased confidence in our conclusions and ensured reproducibility.

2. Experiment on The Production of Sucrose by S. elongatus (comparison experiment using 'standardized excel')
Priori Analysis

In the experiments of CscB-induced expression, we first performed sample prediction based on power analysis. We first formulated several null hypotheses that we wished to reject based on the literature and purpose:

Whether HL7942 is induced by CscB have no influence on cane sugar. Whether HL7942 is induced by CscB is independent variable, Whether there is IPTG and NaCl are covariates.

By estimating that significance level between samples was 'very high', predicted chance that data accidentally cannot prove the true hypothesis was 'strict' and predicted chance that data accidentally prove the false hypothesis is 'loose', we get the parameters of sample size predicting. The relaxation of the hypothesis for type I and type II error is due to the fact that the project design has more experiments and therefore too many parallel groups of experimental groups are difficult to achieve.

Before doing power analysis through the estimates described above, several concepts in power analysis are first introduced:

  • number of groups (`k`) is number of groups
  • sample size (`n`) is the act of choosing the number of observations or replicates to include in a statistical sample.
  • effect size (`ES`) is a value measuring the strength of the relationship between two variables.
  • sig.level (`\alpha`) is possibility that data accidentally cannot prove the true hypothesis (type I error).
  • power (`\beta`) is the possibility that data accidentally prove the false hypothesis (type II error).

In this experiment we will perform an ANOVA on the balanced one-way ANOVA. Therefore, comparison between two groups means number of groups was 2 (`k=2`), very high significance level between samples means effect size was 0.5 (`ES=0.5`) [4], strict chance of type I error means sig.level was 0.05 (`\alpha=0.05`) and loose chance of type II error means power aws 0.7 (`\beta=0.7`). Using the package 'pwr' in R, it predicted that 13.37 samples are needed for each group (`n=13,37`). In our experiment we chose 12 for convenient (`n=12`).

During Experiment

Please check the corresponding section in Wetlab.

Posteriori Analysis

Take the post-72h group as an example. By inputing experiment data to standardized excel in groups, HL7942-CscB group have higher mean of OD480/OD685 (0.05832) than HL7942 (0.02558). ANCOVA showed that the addition of NaCl was associated with OD480/OD685 (p=0.000449 < 0.05). Controling the addition of NaCl, CscB was associated with OD480/OD685 (p=6.35e-07 < 0.05).

Before considering our conclusions is reliable, we verified the validity of the hypothesis of ANCOVA, including normality hypothesis, homoscedasticity assumption and homogeneity of regression slope.

As been shown in QQ Plot, the data fall within the 95% confidence interval, indicating that the assumption of normality is satisfied.


qqPlot

Bartlett test also showed that it satisfied homoscedasticity assumption (p=0.0725)

Besides, all the interaction effects were not significant (p of CscB:NaCl=0.0126, CscB:IPTG=0.4971, NaCl:IPTG=0.9934, CscB:NaCl:IPTG=0.9435), support the homogeneity of regression slope.

To quantify repeatability and reliability, a post hoc power analysis is effective.

The calculated significance level between samples means effect size was 0.3133 (`ES=0.3133`), sample size was 12 (`n=12`), we get that sig.level was 0.1 (by adjusting to 'loose' in standardized excel), the corresponding power was 0.4384 (`\beta=0.4384`) which means that if other team want to reference or replication this experiment should be care about the possibility that we came to the conclusion by mistaking type I or type II error.

References

[1] Harris, J.K., Combs, T.B., Johnson, K.J., Carothers, B.J., Luke, D.A., and Wang, X. (2019). Three Changes Public Health Scientists Can Make to Help Build a Culture of Reproducible Research. Public Health Reports 134, 109-111. 10.1177/0033354918821076.

[2] The Practice of Reproducible Research Case Studies and Lessons from the Data-Intensive Sciences. (2018). 1 Edition (University of California Press).

[3] Kühne, M.and Liehr, A.W. (2009). Improving the Traditional Information Management in Natural Sciences.Data Science Journal8, 18-26.

[4] Sawilowsky, Shlomo S. (2009) "New Effect Size Rules of Thumb," Journal of Modern Applied Statistical Methods: Vol. 8 : Iss. 2 , Article26.