图片加载失败 图片加载失败

Parts Overview

LncRNA plays an important role in many life activities such as dose compensation effect, epigenetic regulation, cell cycle regulation and cell differentiation regulation. LINC00857 was found to regulate cell proliferation, migration, invasion, and tumor growth in lung cancer [1] and recently also shown to play an oncogenic role in gastric, bladder, liver, and esophageal cancers [2-6].Inspired by this, we used bioinformatics methods to analyze the expression of LINC00857 in various cancers and revealed its potential biomarker function for detection. Then,we further combined the expression of LINC00857 with clinical data, and constructed models for diagnosis and detection of prognosis in various cancers. Based on our work, we found that LINC00857 is differentially expressed in a variety of cancers and is a potential biomarker for diagnosis and detection of prognosis in various cancers.

Gene Expression Analysis

We used the built-in R (Version 3.6.4) tool of the Sangerbox[7] to acquire plots of the UCSC XENA RNAseq data in TPM (transcripts per million reads) format for the Cancer Genome Atlas (TCGA) and the Genotype Tissue Expression Project, and the data is uniformly processed by the Toil process[8]. Then we analyzed LINC00857 expression in all TCGA cancers at different pathological stages by using GEPIA2 (Interactive Gene Expression profiling, Version 2) tool. The log2[TPM (per million transcripts) +1] transformed expression data is applied to the violin diagram. For tumors without normal sample data in TCGA, we used the GTEx dataset to further evaluate the difference in LINC00857 expression in tumor and normal samples. We found that the level of LINC00857 had different expression in tumor samples (Figure 2A, P<0.05). Then we used GEPIA2 to investigate the differential expression of LINC00857 between tumors and adjacent normal samples. As shown in Figure 2B, the expression level of LINC00857 in the tumor samples of KIRP, LUAD, PAAD and STAD higher than the corresponding control samples. We also used the exorBase 2.0 to evaluate the expression of LINC00857 in the human blood. As shown in Figure 2C-D, in the human blood samples, we found that the level of LINC00857 had different expression in BRCA, CRC, GBM, HCC, OV, PAAD and ESCC.

Figure 1. Interactive Body Map. The median expression of tumor and normal samples in body map by GEPIA2.
Figure 2. Expression levels of LINC00857 in human tumors. (A) LINC00857 expression level comparison in TCGA project relative to the corresponding normal samples (GTEx database). *p < 0.05; p < 0.01; *p < 0.001. (B) Expression level of LINC000857 in TCGA tumors vs. adjacent samples (if available) as visualized by GEPIA2. (C-D) Expression level of LINC000857 in human blood.

We also analyzed the relationship between LINC00857 expression and tumor pathological staging using the GEPIA2 tool, which suggests that stage-specific expression changes in LINC00857 expression in case of many tumor types, such as UCEC, COAD, ESCA, KIRC, KIRP, LUSC, PAAD, SKCM, BLCA, LUAD and TGCT (Figure 3, all P<0.05).

Figure 3. Stage-dependent expression level of LINC00857. Main pathological stages (stage I, stage II, stage III, and stage IV) of BLCA, COAD, ESCA, KIRC, KIRP, LUAD, LUSC, PAAD, SKCM, TGCT and UCEC were assessed and compared using GEPIA2 . Expression levels are shown as Log2 (TPM+1).

Genetic Alteration Analysis

We used cBioPortal tool to collect the data of alteration frequency information across all TCGA tumors. Survival data, including PFS (Progress Free Survival) were compared for all the TCGA cancer types, with or without LINC00857genetic alteration. Human cancers develop due to the accumulation of genetic alterations. Therefore, we next wanted to explore LINC00857 gene alterations in human tumor samples. According to our analysis, the LINC00857 alterations occured in a variety of cancers with “Amplification” as the primary type (Figure 4A). This supported the differential expression previously. To understand whether there is a relationship between certain genetic alterations in LINC00857 and the clinical survival prognosis of patients, we conducted systematic studies in various types of tumors and correlated them. Patients with genetic alteration of LINC00857 showed a worse prognosis in PFS (P=7.928e-3) compared with patients without LINC00857 alterations (Figure 4B).

Figure 4. Mutational status of LINC00857 in TCGA tumors. We analyzed the mutational status of LINC00857 in TCGA tumors by using using the cBioPortal tool. The alteration frequency with mutation type (A) and mutation site (B) Analysis of the correlation between mutation status and PFS using the cBioPortal tool.

Classifying Cancer Versus Normal Samples

We hope to use the expression of LINC00857 to assess its potential ability for cancer diagnosis. So we used the R (Version 3.6.3) tool with R package “pROC”(Version 1.17.0.1) to predict cancer. The ROC curve indicated that LINC00857 expression had good predictive power with an area under the curve (AUC) of 0.929 (95% confidence interval [CI] =0.911–0.947) to discriminate lung cancer samples from normal samples (Figure 5A). The expression of LINC00857 in ACC, COADREAD, DLBC, ESCA, GBM, KICH, KIRP, LAML, PAAD, PRAD, READ, STAD, THYM and UCS also has good discrimination.

Figure 5 ROC curves for classifying cancer versus normal breast samples in the TCGA database.

Survival Analysis

We also hope to use the expression of LINC00857 to assess its potential ability for detection of prognosis. The prognostic values of LINC00857 was analyzed. Through all TCGA tumors RNAseq data and the prognostic data, gene expression profiling and interactive analyses were used for survival analyses. By using the GEPIA2, the OS (Overall Survival) and DFS (Disease Free Survival) were integrated to the Kaplan - Meier plots while calculating target gene ’s hazard ratios (HRs)..High expression of LINC00857 was associated with good prognosis of OS for cancers including KIRP (Figure 6A). DFS analysis data (Figure 6E) showed that high expression of LINC00857 is associated with good prognosis for KIRP. High expression of LINC00857 was associated with worse prognosis of OS for cancers including LIHC, LUAD and PAAD (Figure 6B-D). DFS analysis data showed that high expression of LINC00857 is associated with worse prognosis for LIHC, LUAD and PAAD (Figure 6F-H). Based on this work, we found that LINC00857 is a potential biomarker for detection of prognosis in various cancers.

Figure 6. Relationship between LINC00857 expression level and patient survival in GEPIA2. Relationship between LINC00857 gene expression and survival including overall survival (A) and disease-specific survival (B) were assessed in GEPIA2. The positive results are listed by using Kaplan-Meier curves.

Expectation

Atherosclerosis

This part could lead to the identification of potential diagnostic lncRNA markers for atherosclerosis patients.

Identification of DEGs

The gene expression profiles of atherosclerosis (GSE202625) were downloaded from the Gene Expression Omnibus (GEO) database. After identifying the common differentially expressed genes (DEGs) of atherosclerosis by using the R package “DESeq2”[9]. A total of 186 DEGs were identified in the atherosclerosis patients combined dataset using the DESeq2 method, of which 97 were upregulated and 89 were downregulated. The volcano plot of atherosclerosis DEGs are shown in Figure 7. Among them, 15 lncRNAs are candidate diagnostic markers for atherosclerosis.

Figure 7. The volcano map of GSE202625.

Machine Learning

LASSO is a regression method for selecting a variable to improve the predictive accuracy and is also a regression technique for variable selection and regularization to improve the predictive accuracy and comprehensibility of a statistical model [10, 11]. “glmnet”[12] R packages were used to perform LASSO regression. The intersection genes of LASSO was considered as candidate hub lncRNAs in atherosclerosis diagnosis. Six lncRNAs (MIR193BHG, LINC01309, HMBOX1-IT1, LINC01389, LINC02156 and LINC01187) were identified for the final biomarkers (Figure 8A,B).

Figure 8. Machine learning in screening candidate diagnostic biomarkers for atherosclerosis. (A, B) Biomarkers screening in the Lasso model.

Diagnostic Value Assessment

We used the R (Version 3.6.3) tool with R package “pROC”(Version 1.17.0.1) to predict atherosclerosis. The ROC curve indicated that lncRNAs expression had good predictive power with an area under the curve (AUC) of 0.941 (95% confidence interval [CI] =0.881-1.000) to discriminate latherosclerosis from normal (Figure 9). Therefore, the lncRNAs we selected had a high accuracy in diagnosing atherosclerosis.

Figure 9. The ROC curve of candidate lncRNAs

Reference

[1] Wang, L., He, Y., Liu, W., Bai, S., Xiao, L., Zhang, J., Dhanasekaran, S.M., Wang, Z., Kalyana-Sundaram, S., Balbin, O.A., et al. Non-coding RNA LINC00857 is predictive of poor patient survival and promotes tumor progression via cell cycle regulation in lung cancer. Oncotarget, 2016, 7: 11487–11499.

[2] Dudek, A.M., van Kampen, J.G.M., Witjes, J.A., Kiemeney, L.A.L.M., and Verhaegh, G.W. LINC00857 expression predicts and mediates the response to platinumbased chemotherapy in muscle-invasive bladder cancer. Cancer Med, 2018, 7: 3342–3350.

[3] Pang, K., Ran, M.J., Zou, F.W., Yang, T.W., and He, F. Long non-coding RNA LINC00857 promotes gastric cancer cell proliferation and predicts poor patient survival. Oncol. Lett, 2018, 16: 2119–2124.

[4] Xia, C., Zhang, X.-Y., Liu, W., Ju, M., Ju, Y., Bu, Y.-Z., Wang, W., and Shao, H. LINC00857 contributes to hepatocellular carcinoma malignancy via enhancing epithelial-mesenchymal transition. J. Cell. Biochem, 2019, 120: 7970–7977.

[5] Su, W., Wang, L., Niu, F., Zou, L., Guo, C., Wang, Z., Yang, X., Wu, J., Lu, Y., Zhang, J., et al. LINC00857 knockdown inhibits cell proliferation and induces apoptosis via involving STAT3 and MET oncogenic proteins in esophageal adenocarcinoma. Aging (Albany NY), 2019, 11: 2812–2821.

[6] Wang, L., Cao, L., Wen, C., Li, J., Yu, G., and Liu, C. LncRNA LINC00857 regulates lung adenocarcinoma progression, apoptosis and glycolysis by targeting miR-1179/SPAG5 axis. Hum. Cell, 2020, 33: 195–204.

[7] Shen, et al. Sangerbox: A comprehensive, interaction-friendly clinical bioinformatics analysis platform. iMeta, 2022, 1(3): e36.

[8] Vivian J, Rao AA, Nothaft FA, Ketchum C, Armstrong J, Novak A, Pfeil J, Narkizian J, Deran AD, Musselman-Brown A, Schmidt H, Amstutz P, Craft B, Goldman M, Rosenbloom K, Cline M, O'Connor B, Hanna M, Birger C, Kent WJ, Patterson DA, Joseph AD, Zhu J, Zaranek S, Getz G, Haussler D, Paten B. Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol, 2017, 35(4):314-316.

[9] Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol, 2014, 15(12):550.

[10] Zhou Y, Shi W, Zhao D, Xiao S, Wang K, Wang J. Identification of Immune-Associated Genes in Diagnosing Aortic Valve Calcification With Metabolic Syndrome by Integrated Bioinformatics Analysis and Machine Learning. Front Immunol, 2022, 13:937886.

[11] Yang C, Delcher C, Shenkman E, Ranka S. Machine Learning Approaches for Predicting High Cost High Need Patient Expenditures in Health Care. BioMed Eng Online, 2018, 17(Suppl 1):131.

[12] Zhang M, Zhu K, Pu H, Wang Z, Zhao H, Zhang J, Wang Y. An Immune-Related Signature Predicts Survival in Patients With Lung Adenocarcinoma. Front Onco, 2019, 9:1314.