Results

Single-cell analysis reveals a mesenchymal type cell subpopulation.


Figure1 Figure1

Figure 1. scRNA-seq reveals a mesenchymal type cell subpopulation

Left: The single cell RNA sequencing reveals 43 distinct cell subpopulations. Right: The higher expression of mesenchymal markers SNAI1, MMP7 and SNAI2 in cluster 35.

We collect tumor samples from 22 hepatocellular carcinoma (HCC) patients and perform scRNA-seq on these samples. After data integration and denosing, we find 43 distinct cell populations1-3. Among these cell populations, cluster 35 highly expresses mesenchymal cell markers SNAI1, MMP7, SNAI2, etc. These genes are believed to be involved in triggerring cell epithelial to mensenchymal transition (EMT) program, helping tumor cell destroy extracelluar matrix and eventually contributing to tumor metastasis and invasion4.

Trajectory inference and lineage clustering find potent paths toward mesenchymal state.


Figure2 Figure2

Figure 2. Graph based methods reveals potent paths toward mesenchymal state.

Left: Graph based methods find potent paths toward EMT, different path clusters are marked with different colors. Right: Measure path simmilarity with Hausdorff distance and project the reuslts with mds reduction.

Believing that cluster 35 is in mesenchymal state and may contribute to tumor metastasis, we next qusetion that what gene expression patterns will happen during EMT5-6. Though there are many availble algorithrms currently, we decide to develop a new unsupervised method to solve this problem. Our method involves the following steps:
1. Project cells into low dimensions with decomposition methods such as PCA, Diffusion Map or UMAP.
2. Pre-cluster cells into small groups with a revised k-means clustering method.
3. Measure the connectivity between small groups with Gaussian kernel.
4. Use graph-based pathfinding algorithrm to find paths toward the target cell group.
5. Measure path simmilarity with Hausdorff distance and cluster paths with k-means
As a result, we successfully find distint paths toward cluster 35, as you can see in the Figure 2.

Validation on hemopoietic cell differentiation dataset reconstructs three distint differentiation types.


Figure3 Figure3 Figure3 Figure3 Figure3

Figure 3. Heatmap shows potent fates of hematopoietic stem cell differentiation

The 5 potent differentiation fates of hematopoietic stem cell is plotted with heatmaps. The left most of each heatmap indicates the original stem state, which then differentiate into erthrocytes (2, 3, 4), monocytes (5), neutrophils (1) respectively.

In order to validate our method, we decide to test the same pipeline in a hematopoietic cell differentiation dataset includes 2730 hematopoietic cells7. According to the expression pattern of known markers, we find that our method successfully reconstructs erthrocyte, monocyte and neutrophil differentiation pathways in an unsupervised manner (Figure 3). This motivates us to further analyze the expression patterns found during EMT of HCC samples.

GAM fitness reveals different gene expression patterns toward EMT of HCC cells.


Figure4 Figure4

Figure 4. GAM to autonomously find different expression patterns of scaled gene counts

Left: The GAM fitted gene expression patterns are projected into 2-dimension with mds reduction. Right: GAM fitted scaled counts across pseudotime.

In order to filter out possible gene expression pattern that are revalent with EMT program in different lineages, we use generalized addictive model (GAM) to fit the scaled gene counts acrocess the path8. Similar method has been discribed by Koen Van den Berge et al9. Here, we manually set a R2 of 0.6 as a threshold of good fit model, which will then be considered in later analysis. Using this method, we find three gene expression patterns toward EMT, including down regulation, up regulation ,down first and up next regulation (Figure 4. Right).

Lieange 6 is revalent to better clinical outcome.


Figure5 Figure5

Figure 5.Lieange 6 is revalent to better clinical outcome and specific inflammatory reaction expression pattern.

Left: Kaplan-Meier analysis reveals that lineage 6 (orange line) is associated with better clinical outcomes. Right: Lineage 6 specific down regulation pattern involved a set of inflammatory reaction related genes.

We next ask that if there is any lineage associated with clinical outcomes and what is its specific expression pattern. In order to do this, We collect bulk RNA-seq data with clinical outcome information from HCC database and impute their fraction of different lineages with previous described method10-11. Kaplan-Meier analysis finds that higher fraction of lineage 6 in patients is associated with better clinical outcomes (Figure 5. Left). A t-test analysis filters lineage 6 specific expression patterns and gene ontology analysis is performed to enrich these expression patterns12 (Figure 5. Right). As a result, we find this lineage has a enrichment of downregulated inflammation genes during EMT, which may inhibit tumor cells from entering mesenchymal state and eventually causing metastasis.

Future

In summary, our method provides a new insight into the cell differentiation dynamics. The major darwback of our project is that the result of method needs to be further analyzed and validated at clinics and cell lines. We hope our method could help doctors and researchers to better analyze the dynamics of complex pathological samples, which will eventually be used to help more patients suffering from tumor or other diseases.

Reference

  • 1. van Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ, Burdziak C, Moon KR, Chaffer CL, Pattabiraman D, Bierie B, Mazutis L, Wolf G, Krishnaswamy S, Pe'er D. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Cell. 2018 Jul 26;174(3):716-729.e27. doi: 10.1016/j.cell.2018.05.061. Epub 2018 Jun 28. PMID: 29961576; PMCID: PMC6771278..
  • 2. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0. PMID: 29409532; PMCID: PMC5802054.
  • 3. Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LM, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31. PMID: 34062119; PMCID: PMC8238499.
  • 4. Dongre A, Weinberg RA. New insights into the mechanisms of epithelial-mesenchymal transition and implications for cancer. Nat Rev Mol Cell Biol. 2019 Feb;20(2):69-84. doi: 10.1038/s41580-018-0080-4. PMID: 30459476.
  • 5. Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, Purdom E, Dudoit S. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018 Jun 19;19(1):477. doi: 10.1186/s12864-018-4772-0. PMID: 29914354; PMCID: PMC6007078.
  • 6. Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, Rajewsky N, Simon L, Theis FJ. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019 Mar 19;20(1):59. doi: 10.1186/s13059-019-1663-x. PMID: 30890159; PMCID: PMC6425583.
  • 7. Paul F, Arkin Y, Giladi A, Jaitin DA, Kenigsberg E, Keren-Shaul H, Winter D, Lara-Astiaso D, Gury M, Weiner A, David E, Cohen N, Lauridsen FK, Haas S, Schlitzer A, Mildner A, Ginhoux F, Jung S, Trumpp A, Porse BT, Tanay A, Amit I. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell. 2015 Dec 17;163(7):1663-77. doi: 10.1016/j.cell.2015.11.013. Epub 2015 Nov 25. Erratum in: Cell. 2016 Jan 14;164(1-2):325. PMID: 26627738.
  • 8. Servén D., Brummitt C. (2018). pyGAM: Generalized Additive Models in Python. Zenodo. DOI: 10.5281/zenodo.1208723
  • 9. Van den Berge K, Roux de Bézieux H, Street K, Saelens W, Cannoodt R, Saeys Y, Dudoit S, Clement L. Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun. 2020 Mar 5;11(1):1201. doi: 10.1038/s41467-020-14766-3. PMID: 32139671; PMCID: PMC7058077.
  • 10. Lian Q, Wang S, Zhang G, Wang D, Luo G, Tang J, Chen L, Gu J. HCCDB: A Database of Hepatocellular Carcinoma Expression Atlas. Genomics Proteomics Bioinformatics. 2018 Aug;16(4):269-275. doi: 10.1016/j.gpb.2018.07.003. Epub 2018 Sep 25. PMID: 30266410; PMCID: PMC6205074.
  • 11. Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling Tumor Infiltrating Immune Cells with CIBERSORT. Methods Mol Biol. 2018;1711:243-259. doi: 10.1007/978-1-4939-7493-1_12. PMID: 29344893; PMCID: PMC5895181.
  • 12. Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113. PMID: 33290552; PMCID: PMC7779012.