Figure 1. scRNA-seq reveals a mesenchymal type cell subpopulation
Left: The single cell RNA sequencing reveals 43 distinct cell subpopulations. Right: The higher expression of mesenchymal markers SNAI1, MMP7 and SNAI2 in cluster 35.
We collect tumor samples from 22 hepatocellular carcinoma (HCC) patients and perform scRNA-seq on these samples. After data integration and denosing, we find 43 distinct cell populations1-3. Among these cell populations, cluster 35 highly expresses mesenchymal cell markers SNAI1, MMP7, SNAI2, etc. These genes are believed to be involved in triggerring cell epithelial to mensenchymal transition (EMT) program, helping tumor cell destroy extracelluar matrix and eventually contributing to tumor metastasis and invasion4.
Figure 2. Graph based methods reveals potent paths toward mesenchymal state.
Left: Graph based methods find potent paths toward EMT, different path clusters are marked with different colors. Right: Measure path simmilarity with Hausdorff distance and project the reuslts with mds reduction.
Believing that cluster 35 is in mesenchymal state and may contribute to tumor metastasis, we next qusetion that what gene expression patterns will happen during EMT5-6. Though there are many availble algorithrms currently, we decide to develop a new unsupervised method to solve this problem. Our method involves the following steps:
1. Project cells into low dimensions with decomposition methods such as PCA, Diffusion Map or UMAP.
2. Pre-cluster cells into small groups with a revised k-means clustering method.
3. Measure the connectivity between small groups with Gaussian kernel.
4. Use graph-based pathfinding algorithrm to find paths toward the target cell group.
5. Measure path simmilarity with Hausdorff distance and cluster paths with k-means
As a result, we successfully find distint paths toward cluster 35, as you can see in the Figure 2.
Figure 3. Heatmap shows potent fates of hematopoietic stem cell differentiation
The 5 potent differentiation fates of hematopoietic stem cell is plotted with heatmaps. The left most of each heatmap indicates the original stem state, which then differentiate into erthrocytes (2, 3, 4), monocytes (5), neutrophils (1) respectively.
In order to validate our method, we decide to test the same pipeline in a hematopoietic cell differentiation dataset includes 2730 hematopoietic cells7. According to the expression pattern of known markers, we find that our method successfully reconstructs erthrocyte, monocyte and neutrophil differentiation pathways in an unsupervised manner (Figure 3). This motivates us to further analyze the expression patterns found during EMT of HCC samples.
Figure 4. GAM to autonomously find different expression patterns of scaled gene counts
Left: The GAM fitted gene expression patterns are projected into 2-dimension with mds reduction. Right: GAM fitted scaled counts across pseudotime.
In order to filter out possible gene expression pattern that are revalent with EMT program in different lineages, we use generalized addictive model (GAM) to fit the scaled gene counts acrocess the path8. Similar method has been discribed by Koen Van den Berge et al9. Here, we manually set a R2 of 0.6 as a threshold of good fit model, which will then be considered in later analysis. Using this method, we find three gene expression patterns toward EMT, including down regulation, up regulation ,down first and up next regulation (Figure 4. Right).
Figure 5.Lieange 6 is revalent to better clinical outcome and specific inflammatory reaction expression pattern.
Left: Kaplan-Meier analysis reveals that lineage 6 (orange line) is associated with better clinical outcomes. Right: Lineage 6 specific down regulation pattern involved a set of inflammatory reaction related genes.
We next ask that if there is any lineage associated with clinical outcomes and what is its specific expression pattern. In order to do this, We collect bulk RNA-seq data with clinical outcome information from HCC database and impute their fraction of different lineages with previous described method10-11. Kaplan-Meier analysis finds that higher fraction of lineage 6 in patients is associated with better clinical outcomes (Figure 5. Left). A t-test analysis filters lineage 6 specific expression patterns and gene ontology analysis is performed to enrich these expression patterns12 (Figure 5. Right). As a result, we find this lineage has a enrichment of downregulated inflammation genes during EMT, which may inhibit tumor cells from entering mesenchymal state and eventually causing metastasis.
In summary, our method provides a new insight into the cell differentiation dynamics. The major darwback of our project is that the result of method needs to be further analyzed and validated at clinics and cell lines. We hope our method could help doctors and researchers to better analyze the dynamics of complex pathological samples, which will eventually be used to help more patients suffering from tumor or other diseases.