CNS图表复现20—第三次分群，以T细胞为例 / 四六文摘

前面我们展现了 CNS图表复现08—肿瘤单细胞数据第一次分群通用规则，然后呢，第二次分群的上皮细胞可以细分恶性与否，免疫细胞呢，细分可以成为: B细胞，T细胞，巨噬细胞，树突细胞等等。实际上每个免疫细胞亚群仍然可以继续精细的划分，以文章为例：

Macrophages from lung tumor biopsies (n = 1,379) were clus- tered into 5 distinct groups (Figure S7A) followed by differential gene expression in each resulting cluster
T cells and NK cells (n = 2,226) were analyzed in the same manner as macrophages and resulted in 5 distinct T/NK cell pop- ulations .

这就是正文的图表5：

首先是Macrophages细分：

各个细胞亚群的比例展示及标记基因小提琴图：

各个细胞亚群的分布情况及标记基因热图；

然后是T cells and NK cells的细分

各个细胞亚群的比例展示及标记基因小提琴图：

各个细胞亚群的分布情况及标记基因热图；

尝试复现T细胞的亚群细分

文章提到的是

Macrophages from lung tumor biopsies (n = 1,379) were clus- tered into 5 distinct groups (Figure S7A) followed by differential gene expression in each resulting cluster
T cells and NK cells (n = 2,226) were analyzed in the same manner as macrophages and resulted in 5 distinct T/NK cell pop- ulations .

首先我们拿到T cells and NK cells数据集

rm(list=ls()) options(stringsAsFactors = F) library(Seurat) library(ggplot2) ### 来源于：CNS图表复现02—Seurat标准流程之聚类分群的step1-create-sce.R load(file = 'first_sce.Rdata')


### 来源于 step2-anno-first.R 

load(file = 'phe-of-first-anno.Rdata')
sce=sce.first

table(phe$immune_annotation)

# #  immune (CD45+,PTPRC), epithelial/cancer (EpCAM+,EPCAM), and stromal (CD10+,MME,fibo or CD31+,PECAM1,endo)
genes_to_check = c("PTPRC","CD19",'PECAM1','MME','CD3G', 'CD4', 'CD8A' )

p <- DotPlot(sce, features = genes_to_check,

             assay='RNA' )  

p

table(phe$immune_annotation,phe$seurat_clusters)
cells.use <- row.names(sce@meta.data)[which(phe$immune_annotation=='immune')]

length(cells.use)

sce <-subset(sce, cells=cells.use)  

sce
# 实际上这里需要重新对sce进行降维聚类分群，为了节省时间

# 我们直接载入前面的降维聚类分群结果，但是并没有载入tSNE哦

## 来源于：CNS图表复现05—免疫细胞亚群再分类 

load(file = 'phe-of-subtypes-Immune-by-manual.Rdata')

sce@meta.data=phe

table(phe$immuSub)
# 需要背景知识

# https://www.abcam.com/primary-antibodies/immune-cell-markers-poster

# NK Cell* CD11b+, CD122+, NK1.1+, NKG2D+, NKp46+

# NCR1 Gene. Natural Cytotoxicity Triggering Receptor 1; NK-P46; Natural Killer Cell P46-Related Protein; 

# NKG2D is a transmembrane protein belonging to the NKG2 family of C-type lectin-like receptors. NKG2D is encoded by KLRK1 gene
# T Cell* CD3+ 

genes_to_check = c('CD3G', 'CD4', 'CD8A', 'FOXP3',

                   'TNF','IFNG','KLRK1','NCR1')

p <- DotPlot(sce, features = genes_to_check,group.by = 'seurat_clusters') + coord_flip()

p

cells.use <- row.names(sce@meta.data)[which(phe$immuSub=='Tcells')] length(cells.use) sce <-subset(sce, cells=cells.use) sce # 26485 features across 4555 samples within 1 assay

然后走降维聚类分群流程

挑选出来了 4555 个细胞，走标准流程即可：

sce <- NormalizeData(sce, normalization.method = "LogNormalize", scale.factor = 10000) GetAssay(sce,assay = "RNA") sce <- FindVariableFeatures(sce, selection.method = "vst", nfeatures = 2000) # 步骤 ScaleData 的耗时取决于电脑系统配置（保守估计大于一分钟） sce <- ScaleData(sce) sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce)) DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE) ElbowPlot(sce) sce <- FindNeighbors(sce, dims = 1:15) sce <- FindClusters(sce, resolution = 0.2) table(sce@meta.data$RNA_snn_res.0.2) sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE) DimPlot(sce,reduction = "umap",label=T) # 这个时候分成了6群

查亚群的标记基因

如果我们去 https://www.abcam.com/primary-antibodies/t-cells-basic-immunophenotyping 可以看到

Killer T cells (cytotoxic T lymphocytes: CD8+) , Effector or memory
Helper T cells (Th cells: CD4+), TH1,2,17
Regulatory T cells (Treg cells: CD4+, CD25+, FoxP3+, CD127+)
interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α), two factors produced by activated T cells
Helper Th1,IFNγ, IL-2, IL-12, IL-18

比较简单的CD4,CD8,以及FOXP3各自代表的T细胞亚群。比较复杂的可以是：张泽民团队的单细胞研究把T细胞分的如此清楚

但是很不幸，在这个数据集里面它们都不是主要的细胞亚群决定因素。

genes_to_check = c('CD3G', 'CD4', 'CD8A', 'FOXP3', 'TNF','IFNG','KLRK1','NCR1') p1 <- DotPlot(sce, features = genes_to_check,group.by = 'seurat_clusters') + coord_flip() p1 p2=DimPlot(sce,reduction = "umap",label=T) library(patchwork) p1+p2

出图如下：

文章呢，选取的其实是 biopsy_site == "Lung" 的T cells and NK cells (n = 2,226) ，区分成为5 distinct T/NK cell pop- ulations .

> table(sce@meta.data$biopsy_site)

Adrenal Brain Liver LN lung Lung Pleura 226 2 586 700 485 1692 864

所以我们只能是继续挑选子集走流程。代码如下：

sce <- NormalizeData(sce, normalization.method = "LogNormalize", scale.factor = 10000) GetAssay(sce,assay = "RNA") sce <- FindVariableFeatures(sce, selection.method = "vst", nfeatures = 2000) # 步骤 ScaleData 的耗时取决于电脑系统配置（保守估计大于一分钟） sce <- ScaleData(sce) sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce)) DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE) ElbowPlot(sce) sce <- FindNeighbors(sce, dims = 1:15) sce <- FindClusters(sce, resolution = 0.5) table(sce@meta.data$RNA_snn_res.0.5) sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE) genes_to_check = c('CD3G', 'CD4', 'CD8A', 'FOXP3', 'TNF','IFNG','KLRK1','NCR1') p1 <- DotPlot(sce, features = genes_to_check,group.by = 'seurat_clusters') + coord_flip() p1 p2=DimPlot(sce,reduction = "umap",label=T) library(patchwork) p1+p2

可以看到，并没有本质上的区别：

复现失败了！唯一看起来比较类似的就是 FOXP3 positive Tregs in cluster 5，哈哈哈哈！

Cluster0: generic dont really know.
Cluster1: Exhaustion in the pD cluster (1) and the PD/PR cluster (4). (PR/PD feature)
Cluster2: Cluster 2 is cytotoxic (GZMK, EOMES, CD8pos, IFNG) (Less in PD)
Cluster3: Cluster 3 is enriched for NK cells (Naive feature)
Cluster4: Exhaustion in the pD cluster (1) and the PD/PR cluster (4). (PR/PD feature)
Cluster5: FOXP3 positive Tregs in cluster 5 (Less in PR)

CNS图表复现20—第三次分群，以T细胞为例

首先是Macrophages细分 ：