Overview


On our journey to revealing the ideal biomarker for the early detection of non-small cell lung cancer, our team had to follow the basic milestones of our project design. Since the DIAS detection platform is actually a liquid biopsy test, it was required to find a biomarker showing increased expression levels in patients' blood samples compared to controls. So, we came across the miRNAs. miRNAs, the non-coding RNA molecules of 18-24 nucleotide length, are key regulators in post-transcriptional gene expression and could serve as potential biomarkers for the early detection of lung cancer in patients' blood. Indeed, circulating miRNAs have been the main focus of research for decades regarding their role as biomarkers in cancer research (Hayes et al., 2014) including lung cancer (Cohen et al., 2018), (Sozzi et al., 2014), and more specifically non-small cell lung cancer (LI et al., 2014). They are expressed at levels detectable by existing techniques in whole blood (Ulivi et al., 2013) and plasma (Sozzi et al., 2014) in the early stages of the disease certifying their utility as biomarkers for the early detection of lung cancer.

Literature Research


Prior to applying the bioinformatic analysis, we searched the literature for the miRNA-biomarkers implicated in the early detection of lung cancer. Although the role of circulating miRNAs in lung cancer progression is the subject of extensive research, the results of some studies are not consistent and the information is scattered. Thus, during our research to identify the miRNA biomarkers showing high sensitivity, specificity and consistency, we decided to identify the recent systematic reviews and meta-analyses that concentrate all the available information, rather than rely on primary clinical studies. The candidate miRNAs had to fulfill the following significant criteria to be considered suitable for our diagnostic device:

Taking into account the aforementioned criteria we ended up with the miRNAs presented in Table 1.

Candidate miRNA for the early detection of lung cancer based on literature
miRNA Consistency (%) Number of patients (up/down) Total number of independent studies Sensitivity 1-Specificity
miR-20a 71.4 802/220 9 0.82 0.18
miR-10b 100 976/0 3 0.80 0.12
miR-223 100 1011/0 7 0.80 0.20
miR-17 87.5 615/676 10 0.75 0.20

Bioinformatic Analysis


During the search process, we discovered a miRNA-based diagnostic model for the prediction of lung cancer introduced by Keisuke Asakura and colleagues (Asakura et al., 2020). This model was based on 1.566 lung cancer samples from National Cancer Center (NCC) biobank and 1.774 non-cancer serum samples from NCC and Yokohama Minoru Clinic (YMC), while the dataset is provided by Gene Expression Omnibus (GEO) under the GEO accession “GSE137140”. The miRNA analysis was performed by miRNA microarray utilizing the 3D-Gene Human miRNA V21_1.0.0 platform.

To further analyze the dataset, we applied miRNA Differential Expression Analysis (DEA) with R studio using R package limma (Richie et al., 2015). For the lung cancer sample subgroup, we included only the 1566 pre-operative lung cancer samples ignoring the post-operative lung cancer sample, since the purpose of the analysis was the early detection of miRNAs in lung cancer patients' blood samples. The Principal Component Analysis (PCA) on miRNA expression data revealed sufficient discrimination of the two groups (Disease state: Lung cancer, pre-operation and Disease state: Non-cancer control), as displayed in Figure 1. Principal Component Analysis is a statistical procedure for the dimensionality reduction of a large data set while maintaining the most significant information of the dataset allowing the user to visualize the pattern of the dataset.

The Principal Component Analysis (PCA) plot for 1566 pre-operation lung cancer samples and 1744 non-cancer samples depicts an adequate discrimination of the two groups presented on the right image area. The clusters of the two groups are apparent with some inflow observed from the non-cancer group to the lung cancer group. PCA was constructed with R 4.1.0.

Subsequently, for the visualization of all the miRNAs registered on the dataset we constructed a volcano plot. Volcano plot is a type of scatter-plot that represents differential expression of features with x-axis the logFC (values of a miRNA is higher expressed on lung cancer sample compared to control sample) and y-axis the B value. B value is the log-odds of a miRNA (or gene) being differentially expressed. For instance, if B = 1.5 then the odds ratio is exp(1.5) = 4.8 and therefore the probability of being differentially expressed is 4.48/(1+4.48) = 0.82 (82% probability). If B = 0 then the probability is 50%.

Based on the differential expression analysis we distinguished a subset of total miRNAs which fulfilled the following criteria: P.Val < 0.05 and logFC > 2. The criteria above were chosen for strict control of the dataset so we could extract the miRNAs with a high score of differential expression. In Figure 2 the volcano plots of all miRNAs and the significant differentially expressed miRNAs are illustrated.

Volcano plots of miRNAs. A) Illustration of the volcano plot of all miRNAs from the dataset. B) Illustration of the subset of volcano plot A with the most significant differentially expressed miRNAs. The miRNAs fulfill the criteria: P.Val<0.05; logFC>2. C)miR-17-3P is depicted in the volcano plot with the most significant miRNAs. x-axis: logFC value; y-axis B value. Plots were constructed using R 4.1.0.

One of the most significant miRNAs, as depicted in Figure 2C, is miR-17-3p. According to Keisuke Asakura and colleagues, their miRNA diagnostic model rendered miR-17-3p as the best single miRNA on the discovery set for the detection of lung cancer displaying 93.3% sensitivity and 88.5% specificity. Indeed, our differential expression analysis also showed that miR-17-3p is one of the most significant miRNAs based on logFC value, displaying logFC= 6.397699, P.value = 0 and B = 3216.779. Taking into account all the results of our analyses combined with the diagnostic model of Keisuke Asakura and colleagues we concluded that the elevated levels of miR-17-3p expressed in lung cancer patients' can be used as an accurate biomarker for the early detection of lung cancer. Therefore, we chose to study our diagnostic system with miR-17-3p and miR-17-5p as well, since previous reports revealed that both miR-17 strands work synergically and mentioned the considerable value of miR-17-5p biomarker in lung cancer detection in early stages (Borzi et al., 2021).

R-Based Guided Tool


Guidelines for the R script usage in miRNA Differential Expression Analysis or Gene Expression Analysis

Considering the significance of the bioinformatic analysis in the biomarkers' identification and certification, we developed an R code for handy utilization by any user with basic knowledge of R language programming. This code can be used for the differential expression analysis of miRNAs or genes in any disease or condition. The user can use the code, changing only 5 points. In every step which requires the user's intervention, the code has been configured to automatically display the options available to the user. The five points of the code which required to be modified by the user are described below:

Bibliography


[1]

Asakura, K., Kadota, T., Matsuzaki, J., Yoshida, Y., Yamamoto, Y., Nakagawa, K., Takizawa, S., Aoki, Y., Nakamura, E., Miura, J., Sakamoto, H., Kato, K., Watanabe, S. and Ochiya, T., (2020) "A miRNA-based diagnostic model predicts resectable lung cancer in humans with high accuracy." Communications Biology, 3(1).

[2]

Borzi, C., Ganzinelli, M., Caiola, E., Colombo, M., Centonze, G., Boeri, M., Signorelli, D., Caleca, L., Rulli, E., Busico, A., Capone, I., Pastorino, U., Marabese, M., Milione, M., Broggini, M., Garassino, M., Sozzi, G. and Moro, M., (2021) "LKB1 Down-Modulation by miR-17 Identifies Patients With NSCLC Having Worse Prognosis Eligible for Energy-Stress–Based Treatments." Journal of Thoracic Oncology, 16(8), pp.1298-1311.

[3]

Cohen, J., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., Douville, C., Javed, A., Wong, F., Mattox, A., Hruban, R., Wolfgang, C., Goggins, M., Dal Molin, M., Wang, T., Roden, R., Klein, A., Ptak, J., Dobbyn, L., Schaefer, J., Silliman, N., Popoli, M., Vogelstein, J., Browne, J., Schoen, R., Brand, R., Tie, J., Gibbs, P., Wong, H., Mansfield, A., Jen, J., Hanash, S., Falconi, M., Allen, P., Zhou, S., Bettegowda, C., Diaz, L., Tomasetti, C., Kinzler, K., Vogelstein, B., Lennon, A. and Papadopoulos, N., (2018) "Detection and localization of surgically resectable cancers with a multi-analyte blood test. " Science, 359(6378), pp.926-930

[4]

Hayes, J., Peruzzi, P. and Lawler, S., (2014) "MicroRNAs in cancer: biomarkers, functions and therapy." Trends in Molecular Medicine, 20(8), pp.460-469.

[5]

LI, M., ZHANG, Q., WU, L., JIA, C., SHI, F., LI, S., PENG, A., ZHANG, G., SONG, X. and WANG, C., (2014) "Serum miR-499 as a novel diagnostic and prognostic biomarker in non-small cell lung cancer." Oncology Reports, 31(4), pp.1961-1967.

[6]

Ritchie, M., Phipson, B., Wu, D., Hu, Y., Law, C., Shi, W. and Smyth, G., (2015) "limma powers differential expression analyses for RNA-sequencing and microarray studies." Nucleic Acids Research, 43(7), pp.e47-e47.

[7]

Sozzi, G., Boeri, M., Rossi, M., Verri, C., Suatoni, P., Bravi, F., Roz, L., Conte, D., Grassi, M., Sverzellati, N., Marchiano, A., Negri, E., La Vecchia, C. and Pastorino, U., (2014) "Clinical Utility of a Plasma-Based miRNA Signature Classifier Within Computed Tomography Lung Cancer Screening: A Correlative MILD Trial Study." Journal of Clinical Oncology, 32(8), pp.768-773.

[8]

Ulivi, P., Foschi, G., Mengozzi, M., Scarpi, E., Silvestrini, R., Amadori, D. and Zoli, W., (2013) "Peripheral Blood miR-328 Expression as a Potential Biomarker for the Early Diagnosis of NSCLC." International Journal of Molecular Sciences, 14(5), pp.10332-10342.