Software

Introduction

    Good software is required to guide users to use our project OMEGA better, and make our hardware more integrated. We design a process to help users quickly understand and use our research to diagnose whether their shrimps have AHPND or not. The monitoring device can record the result of the cell-free detection system, but video analysis is not an easy thing. It is a computationally intensive task and has been studied extensively as a computer vision subject. Our software is front and rear-ends separation. After the recording end (Fig. 1a), the video is processed using the back-end algorithm for paper-chip positioning (Fig. 1b) and data filtering (Fig. 1c). Then, the output data is ready for modeling and analysis (Fig. 1d) and the analysis result will be sent to the front-end software for visualization (Fig. 1e). The front and rear-ends separation structure not only overcome the heavy task of video analyzing but also make convenience for customer service follow-up if users need for our solution for AHPND.
Fig. 1 The front and rear-ends separation structure of our software. a The monitoring device record the result of cell-free detection system. b (Back-End Algorithm) The paper-chip positioning algrithm determinds the postion of samples. c (Back-End Algorithm) The filter decrease the noise in the collected data. d (Back-End Algorithm) Data modeling and analysis to mining the information hidden in the data. e (Front-End Software) The analysis result will be sent to the visualization software.

Back-End Algorithm

    Let's use Video A which has three samples with different NanoLuc concentrarions as the example to understand how the back-end algorithm works to determine the position of the sample in the video, denoise data, and estimate the concentration of NanoLuc or β-lactamase.

paper-chip positioning algorithm

    Since the paper chip is round, the traditional Circle Hough Transform (CHT) algorithm is selected as the baseline. To implement the CHT, Video A is transformed into many single picture frames. Then, all the 5578 pictures are concerted to the grayscale one as the pre-processing for CHT. To enhance the speed of detection, only when the environment is dark enough will the CHT begin to work, which can be justified by the total grayscale rate. After CHT, the circles hidden in the picture can be detected. The detection result can be seen in Fig. 2. CHT recognizes 3177 pictures with paper-chip among all the 5578 pictures. CHT is a simple detector, and the result of CHT suffers from the miss-detection in both space-dimension (Fig. 2a) and time-dimension (Fig. 2b) compared to the idealized situation.
Fig. 2 The performance of CHT. a The number of circles in each detected picture. b The frame index in each detected picture.
    Taking the result of CHT as the beginning, we first analyze the three important outputs of CHT. X is the coordinate of the x-axis of the detected circle, Y is the coordinate of the y-axis of the detected circle, and r is the radius of the detected circle. Exploratory data analysis is implemented for X, Y, and r as Fig. 3. Since the unique position distribution of paper chips, the three samples can be easily recognized if we stake all the CHT results together. X, Y, and r of each sample can be easily calculated using the gaussian mixture models with different numbers of clusters, respectively. The number of clusters in X's distribution is the total number of test samples, while it is the row number of samples for Y. All the paper-chips are the same sizes, so only one gaussian distribution can be found in the distribution of r.
Fig. 3 Exploratory data analysis of CHT's result. a The distribution of X. b The distribution of Y. c The distribution of r.
    To increase the sensitivity of the paper-chip positioning algorithm, the number used for clustering in X's distribution will be larger than the actual sample number. Then two steps are used to identify the real position: (1) Merge two clusters whose distance is less than the diameter of the paper chip. (2) Optimize the following problem after merging: min Var[ C 2 C 1 C 3 C 2 C n C n1 ] where ${C_1} < {C_2} < \cdots < {C_{n - 1}} < {C_n}$ is a non-contiguous subsequence of sorted cluster center after merge with the right size of examples.
    Taking Video A as the example, if the number of clusters of X's distribution is set to 8, then we will get the cluster center as follows: [318.68,327.56,389.72,247.94,256.52,398.6,0,0]. In the first step of the algorithm: 318.68 and 327.56 will be merged; 247.94 and 256.52 will be merged; 389.72 and 398.6 will be merged; 0 and 0 will be merged. Then in the second step of the algorithm, 0 will be deleted. Finally, the correct position of the sample is [252.23,323.12,394.16].
    Such an algorithm is sensitive to the low light intensity sample. It has high fault tolerance for mistake operations like the moving of paper chips, which also gives the hardware the ability of multi-channel parallel detection. The algorithm is also been proven from the Video B to Video F in the Model.

data smoothing algrithm

    The data collected after paper-chip positioning is shown in Fig. 4a. The data is noisy due to being out of focus, which is common in low-cost devices. The Savitzky-Golay filter, with two hyperparameters including window length degree assigned as 3001 and polynomial degree assigned as 4, is used to denoise the data. The performance is shown in Fig. 4b. The data is used to model, and a new NanoLuc luminescence decay kinetics is discovered.
Fig. 4 Data extraction from Video A. a Before denoise. b After denoise.
    The filter has a good performance in data denoise, but there are two hyperparameters that should be discussed now: windows length and polynomial degree. Since the NanoLuc luminescence decay kinetics is determined, we try to optimize the following problem to find the best windows length, polynomial degree, or assignment skills: min WL,m i=1 n [ filter WL,m ( u i )DataSim( du dt =0.0016 u 2 )] 2 min β i=1 n [ filter WL~,m~ ( u i )DataSim( du dt =0.0016 u 1+β )] 2 Where the ${u_i}$ is the data before denoise, $WL$ denotes the window length, and $m$ represents the polynomial degree. The first optimization problem is designed to get the calibrated windows length $WL\~$ and calibrated polynomial degree $m\~$ with the prior knowledge of the standard samples (100 % NanoLuc), while the second problem is designed to estimate the NanoLuc concentration of other samples to test the performance of calibrated windows length and calibrated polynomial degree. The process is performed in Video B and Video C. The result is shown in the following table.
    Video     Groundtruth Windows Length     polynomial Degree     Prediction
B(4422 f) 50% 3513 3 62%
B(4422 f) 25% 3513 3 31%
B(4422 f) 10% 3513 3 12%
C(3446 f) 30% 2479 2 28%
B(4422 f) 50% 4063 Origin (4) 57%
B(4422 f) 25% 4063 Origin (4) 27%
B(4422 f) 10% 4063 Origin (4) 9%
C(3446 f) 30% 2571 Origin (4) 9%
B(4422 f) 50% Origin (3001) Origin (4) 54%
B(4422 f) 25% Origin (3001) Origin (4) 23%
B(4422 f) 10% Origin (3001) Origin (4) 10%

f:frames

    From the table, we can find that the shorter video is much more sensitive to the window length, while the longer video is much more sensitive to the polynomial degree, which provides us with information about how to choose the optimization object in the calibration problem. As for the best one, the filter hyperparameters used to parameterize the ODE equation can estimate the NanoLuc concentration better than the parameters from the optimization problem.

Concentration Estimation

    As we can extract the data correctly and denoise to smooth the data, it is time to gain the concentrations of NanoLuc or β-lactamase. Since the different working mechanisms of two proteins, our discussion will be divided into two parts.
    Thanks to the hyperparameter tuning skill in the data smoothing algorithms and the discussion of the relationship between the learnable parameter, the Nano concentration estimation method is proposed. (1) Video analysis using the paper-chips positioning algorithms, and the data is filtered using the hyperparameter set by yourself; (2) To get the learnable parameter, parameterize the ODE equation using data after denoise. (3) Solve the second optimization problem in data smoothing algorithms parts.
    The concentration estimation of β-lactamase is much easier than the one in NanoLuc. The light intensity is collected and processed with the same protocol as the NanoLuc, but the data filter is not required. Since colorimetry was the method used to estimate the concentration of β-lactamase, the noise caused by out-of-focus will be eliminated in the background removal process. Instead, a gaussian distribution is required to fit the data to gain the real absorbance from the scan time series data created in the hardware. Then, the absorbance is used to calculate the concentration using the calibration curve.

Front-End Software

    The service computer finishes processing the video collected by hardware, and the concentration of each sample is calculated. All the data will be sent to OneNET, which is an internet of things platform, waiting for the user to fetch it. We design an Android application to visualize and store their experimental data. You can download the project source code at https://gitlab.igem.org/2022/software-tools/xmu-china.
    The following are procedural introductions about our software.
    After downloading and installment, users should first register and log in to our app.
    Then you can set details of the wanted data on the settings page. Including the ids of your product and device, the start time of your data. It’s also accessible to designate port id and the amount of data.
    Then our application will accordingly send a request to the corresponding OneNet account in the cloud. Once the verification is legal, the required data will be sent back and stored in our application.
    Data obtained are organized by features like product id, device id, detection time, etc, users can view the visual display of the data article by article in the Fetch interface.
    Or query historical data in the form of a diagram in the History interface and could export them as an Excel file with some statistical results.