Overview
On our journey to revealing the ideal biomarker for the early detection of non-small cell lung cancer, our team had to follow the basic milestones of our project design. Since the DIAS detection platform is actually a liquid biopsy test, it was required to find a biomarker showing increased expression levels in patients' blood samples compared to controls. So, we came across the miRNAs. miRNAs, the non-coding RNA molecules of 18-24 nucleotide length, are key regulators in post-transcriptional gene expression and could serve as potential biomarkers for the early detection of lung cancer in patients' blood. Indeed, circulating miRNAs have been the main focus of research for decades regarding their role as biomarkers in cancer research (Hayes et al., 2014) including lung cancer (Cohen et al., 2018), (Sozzi et al., 2014), and more specifically non-small cell lung cancer (LI et al., 2014). They are expressed at levels detectable by existing techniques in whole blood (Ulivi et al., 2013) and plasma (Sozzi et al., 2014) in the early stages of the disease certifying their utility as biomarkers for the early detection of lung cancer.
Literature Research
Prior to applying the bioinformatic analysis, we searched the literature for the miRNA-biomarkers implicated in the early detection of lung cancer. Although the role of circulating miRNAs in lung cancer progression is the subject of extensive research, the results of some studies are not consistent and the information is scattered. Thus, during our research to identify the miRNA biomarkers showing high sensitivity, specificity and consistency, we decided to identify the recent systematic reviews and meta-analyses that concentrate all the available information, rather than rely on primary clinical studies. The candidate miRNAs had to fulfill the following significant criteria to be considered suitable for our diagnostic device:
- High consistency which means that the upward or downward deregulation trend in lung cancer patients compared to healthy individuals should be consistent among the results of primary clinical studies.
- High sensitivity which is a measure of how efficient the deregulated levels of the miRNA can identify the true positive individuals for lung cancer
- High specificity which is measure of how well the normal levels (not deregulated) of the miRNA can be used to designate an individual who does not have a disease as negative for lung cancer
- High evidence which means that many studies with a high number of individual participants have been conducted for a given miRNA.
Taking into account the aforementioned criteria we ended up with the miRNAs presented in Table 1.
miRNA | Consistency (%) | Number of patients (up/down) | Total number of independent studies | Sensitivity | 1-Specificity |
miR-20a | 71.4 | 802/220 | 9 | 0.82 | 0.18 |
miR-10b | 100 | 976/0 | 3 | 0.80 | 0.12 |
miR-223 | 100 | 1011/0 | 7 | 0.80 | 0.20 |
miR-17 | 87.5 | 615/676 | 10 | 0.75 | 0.20 |
Bioinformatic Analysis
During the search process, we discovered a miRNA-based diagnostic model for the prediction of lung cancer introduced by Keisuke Asakura and colleagues (Asakura et al., 2020). This model was based on 1.566 lung cancer samples from National Cancer Center (NCC) biobank and 1.774 non-cancer serum samples from NCC and Yokohama Minoru Clinic (YMC), while the dataset is provided by Gene Expression Omnibus (GEO) under the GEO accession “GSE137140”. The miRNA analysis was performed by miRNA microarray utilizing the 3D-Gene Human miRNA V21_1.0.0 platform.
To further analyze the dataset, we applied miRNA Differential Expression Analysis (DEA) with R studio using R package limma (Richie et al., 2015). For the lung cancer sample subgroup, we included only the 1566 pre-operative lung cancer samples ignoring the post-operative lung cancer sample, since the purpose of the analysis was the early detection of miRNAs in lung cancer patients' blood samples. The Principal Component Analysis (PCA) on miRNA expression data revealed sufficient discrimination of the two groups (Disease state: Lung cancer, pre-operation and Disease state: Non-cancer control), as displayed in Figure 1. Principal Component Analysis is a statistical procedure for the dimensionality reduction of a large data set while maintaining the most significant information of the dataset allowing the user to visualize the pattern of the dataset.
Subsequently, for the visualization of all the miRNAs registered on the dataset we constructed a volcano plot. Volcano plot is a type of scatter-plot that represents differential expression of features with x-axis the logFC (values of a miRNA is higher expressed on lung cancer sample compared to control sample) and y-axis the B value. B value is the log-odds of a miRNA (or gene) being differentially expressed. For instance, if B = 1.5 then the odds ratio is exp(1.5) = 4.8 and therefore the probability of being differentially expressed is 4.48/(1+4.48) = 0.82 (82% probability). If B = 0 then the probability is 50%.
Based on the differential expression analysis we distinguished a subset of total miRNAs which fulfilled the following criteria: P.Val < 0.05 and logFC > 2. The criteria above were chosen for strict control of the dataset so we could extract the miRNAs with a high score of differential expression. In Figure 2 the volcano plots of all miRNAs and the significant differentially expressed miRNAs are illustrated.
One of the most significant miRNAs, as depicted in Figure 2C, is miR-17-3p. According to Keisuke Asakura and colleagues, their miRNA diagnostic model rendered miR-17-3p as the best single miRNA on the discovery set for the detection of lung cancer displaying 93.3% sensitivity and 88.5% specificity. Indeed, our differential expression analysis also showed that miR-17-3p is one of the most significant miRNAs based on logFC value, displaying logFC= 6.397699, P.value = 0 and B = 3216.779. Taking into account all the results of our analyses combined with the diagnostic model of Keisuke Asakura and colleagues we concluded that the elevated levels of miR-17-3p expressed in lung cancer patients' can be used as an accurate biomarker for the early detection of lung cancer. Therefore, we chose to study our diagnostic system with miR-17-3p and miR-17-5p as well, since previous reports revealed that both miR-17 strands work synergically and mentioned the considerable value of miR-17-5p biomarker in lung cancer detection in early stages (Borzi et al., 2021).
R-Based Guided Tool
Guidelines for the R script usage in miRNA Differential Expression Analysis or Gene Expression Analysis
Considering the significance of the bioinformatic analysis in the biomarkers' identification and certification, we developed an R code for handy utilization by any user with basic knowledge of R language programming. This code can be used for the differential expression analysis of miRNAs or genes in any disease or condition. The user can use the code, changing only 5 points. In every step which requires the user's intervention, the code has been configured to automatically display the options available to the user. The five points of the code which required to be modified by the user are described below:
-
In the command
dataset <- "GSEXXXXXX"
, the user must set theXXXXXX
with the accession number of the chosen dataset, for instance "GSE137140". -
The user runs the commands until the command
gse <- gse[[Y]]
. In the bracket he must include the number corresponding to the desired platform in which a specific analysis was conducted. The available platform will be shown with the previous commandplatforms
. -
In the command
sampleInfo <- select (sampleInfo, group="Z ", patients=patients)
The user must set theZ
number with the column name where the states of the samples are shown. For instance, if we want to analyze the differential expression of miRNAs from patients with lung cancer compared to controls, we will choose the column name where the samples are mentioned with their states i.e., whether they are lung cancer patients or controls. -
The following commands that the user must change are:
state1 <- ifelse(sampleInfo$group=="
andA
" , 1, 0)state2 <- ifelse(sampleInfo$group=="
In the first command the "B
" , 0, 1).A
" must be set with the exact condition of disease as described in the column groups. Thestate1
condition is always the disease condition. In thestate2
the "B
" must be set with the control, exactly as described in the groups column. Thestate2
must always be the control condition. -
Lastly, in the command
top500$label <- ifelse(top500$Significant=="Significant", top500$K,""),
theK
must be set with the column name where the miRNAs or genes are described with their name or their ID.
Bibliography
Asakura, K., Kadota, T., Matsuzaki, J., Yoshida, Y., Yamamoto, Y., Nakagawa, K., Takizawa, S., Aoki, Y., Nakamura, E., Miura, J., Sakamoto, H., Kato, K., Watanabe, S. and Ochiya, T., (2020) "A miRNA-based diagnostic model predicts resectable lung cancer in humans with high accuracy." Communications Biology, 3(1).
Borzi, C., Ganzinelli, M., Caiola, E., Colombo, M., Centonze, G., Boeri, M., Signorelli, D., Caleca, L., Rulli, E., Busico, A., Capone, I., Pastorino, U., Marabese, M., Milione, M., Broggini, M., Garassino, M., Sozzi, G. and Moro, M., (2021) "LKB1 Down-Modulation by miR-17 Identifies Patients With NSCLC Having Worse Prognosis Eligible for Energy-Stress–Based Treatments." Journal of Thoracic Oncology, 16(8), pp.1298-1311.
Cohen, J., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., Douville, C., Javed, A., Wong, F., Mattox, A., Hruban, R., Wolfgang, C., Goggins, M., Dal Molin, M., Wang, T., Roden, R., Klein, A., Ptak, J., Dobbyn, L., Schaefer, J., Silliman, N., Popoli, M., Vogelstein, J., Browne, J., Schoen, R., Brand, R., Tie, J., Gibbs, P., Wong, H., Mansfield, A., Jen, J., Hanash, S., Falconi, M., Allen, P., Zhou, S., Bettegowda, C., Diaz, L., Tomasetti, C., Kinzler, K., Vogelstein, B., Lennon, A. and Papadopoulos, N., (2018) "Detection and localization of surgically resectable cancers with a multi-analyte blood test. " Science, 359(6378), pp.926-930
Hayes, J., Peruzzi, P. and Lawler, S., (2014) "MicroRNAs in cancer: biomarkers, functions and therapy." Trends in Molecular Medicine, 20(8), pp.460-469.
LI, M., ZHANG, Q., WU, L., JIA, C., SHI, F., LI, S., PENG, A., ZHANG, G., SONG, X. and WANG, C., (2014) "Serum miR-499 as a novel diagnostic and prognostic biomarker in non-small cell lung cancer." Oncology Reports, 31(4), pp.1961-1967.
Ritchie, M., Phipson, B., Wu, D., Hu, Y., Law, C., Shi, W. and Smyth, G., (2015) "limma powers differential expression analyses for RNA-sequencing and microarray studies." Nucleic Acids Research, 43(7), pp.e47-e47.
Sozzi, G., Boeri, M., Rossi, M., Verri, C., Suatoni, P., Bravi, F., Roz, L., Conte, D., Grassi, M., Sverzellati, N., Marchiano, A., Negri, E., La Vecchia, C. and Pastorino, U., (2014) "Clinical Utility of a Plasma-Based miRNA Signature Classifier Within Computed Tomography Lung Cancer Screening: A Correlative MILD Trial Study." Journal of Clinical Oncology, 32(8), pp.768-773.
Ulivi, P., Foschi, G., Mengozzi, M., Scarpi, E., Silvestrini, R., Amadori, D. and Zoli, W., (2013) "Peripheral Blood miR-328 Expression as a Potential Biomarker for the Early Diagnosis of NSCLC." International Journal of Molecular Sciences, 14(5), pp.10332-10342.