什么，你需要1T内存！！！ / 四六文摘

最近接了一个61个10x的单细胞转录组样品项目，使用以前的流程，自动进行质量控制，降维聚类分群，本来应该是分分钟的事情，但是在一个步骤居然卡死了，我看了的这个函数，doubletFinder_v3 ，是去除单细胞转录组里面的双细胞作用，报错如下所示：

> sce.all.filt <- doubletFinder_v3(sce.all.filt, pN = 0.25, pK = 0.09, nExp = nExp, PCs = 1:10) Loading required package: fields Loading required package: spam Loading required package: dotCall64 Loading required package: grid Spam version 2.6-0 (2020-12-14) is loaded. KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 [1] "Creating 96208 artificial doublets..." [1] "Creating Seurat object..." [1] "Normalizing Seurat object..." Performing log-normalization 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| [1] "Finding variable genes..." Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| [1] "Scaling data..." Centering and scaling data matrix |=========================================================| 100% [1] "Running PCA..." [1] "Calculating PC distance matrix..." Error: cannot allocate vector of size 1103.4 Gb >

原来是因为我把 61 个10X单细胞转录组样品项目的全部单细胞数据合并后，细胞数量太多了，一次性走doubletFinder_v3 函数，超出了它的限制。

也就是说，先合并样品，再走doubletFinder_v3 ，连我的服务器也无法hold住，所以我先把61 个10X单细胞转录组样品拆分开来，然后每个10x样品自己独立走doubletFinder_v3 函数，代码如下：

#拆分为个seurat子对象 sce.all.list <- SplitObject(sce.all, split.by = "orig.ident") sce.all.list


library(DoubletFinder)

#sce.all.filt=sce.all.list[[1]]

phe_lt <- lapply(names(sce.all.list), function(x){

  sce.all.filt=sce.all.list[[x]]

  sce.all.filt = FindVariableFeatures(sce.all.filt)

  sce.all.filt = ScaleData(sce.all.filt, 

                           vars.to.regress = c("nFeature_RNA", "percent_mito"))

  sce.all.filt = RunPCA(sce.all.filt, npcs = 20)

  sce.all.filt = RunTSNE(sce.all.filt, npcs = 20)

  sce.all.filt = RunUMAP(sce.all.filt, dims = 1:10)

nExp <- round(ncol(sce.all.filt) * 0.04) # expect 4% doublets sce.all.filt <- doubletFinder_v3(sce.all.filt, pN = 0.25, pK = 0.09, nExp = nExp, PCs = 1:10) # name of the DF prediction can change, so extract the correct column name. DF.name = colnames(sce.all.filt@meta.data)[grepl("DF.classification", colnames(sce.all.filt@meta.data))] p5.dimplot=cowplot::plot_grid(ncol = 2, DimPlot(sce.all.filt, group.by = "orig.ident") + NoAxes(), DimPlot(sce.all.filt, group.by = DF.name) + NoAxes()) p5.dimplot ggsave(filename=paste0("doublet_dimplot_",x,".png"), plot=p5.dimplot) p5.vlnplot=VlnPlot(sce.all.filt, features = "nFeature_RNA", group.by = DF.name, pt.size = 0.1) p5.vlnplot ggsave(paste0("doublet_vlnplot_",x,".png"), plot=p5.vlnplot) print(table(sce.all.filt@meta.data[, DF.name] )) #过滤doublet phe=sce.all.filt@meta.data phe })

虽然是运行了一天一夜才完成。有了结果，就可以去除双细胞啦，代码如下：

kpCells=unlist(lapply(phe_lt, function(x){ #table(x[,ncol(x)]) rownames(x[ x[,ncol(x)]=='Singlet', ]) })) kp = colnames(sce.all) %in% kpCells table(kp) sce=sce.all[,kp]

这样我还是拿到了最终的 sce 对象，后续分析就没有什么问题了！如下所示的一个简单的降维聚类分群图：

什么，你需要1T内存！！！

相关推荐