
我以前是这样介绍 scRNAseq 这个 R包中的数据集:

  • 这个包内置的是 Pollen et al. 2014 数据集,人类单细胞细胞,分成4类,分别是 pluripotent stem cells 分化而成的 neural progenitor cells (“NPC”) ,还有 “GW16” and “GW21” ,“GW21+3” 这种孕期细胞,理解这些需要一定的生物学背景知识,如果不感兴趣,可以略过。
  • 这个R包大小是50.6 MB,下载需要一点点时间,先安装加载它们。

这个数据集很出名,截止2019年1月已经有近400的引用了,后面的人开发R包算法都会在其上面做测试,比如 SinQC 这篇文章就提到:We applied SinQC to a highly heterogeneous scRNA-seq dataset containing 301 cells (mixture of 11 different cell types) (Pollen et al., 2014).这里面的表达矩阵是由 RSEM (Li and Dewey 2011) 软件根据 hg38 RefSeq transcriptome 得到的,总是130个文库,每个细胞测了两次,测序深度不一样。

当时我也指出来了,这个 Pollen et al. 2014 数据集,本质上属于地址为 ,的宝藏网页。

library(scRNAseq)## ----- Load Example Data -----data(fluidigm)ct <- floor(assays(fluidigm)$rsem_counts)ct[1:4,1:4]


一个超级大的更新:Created: May 25, 2016; Compiled: 2020-05-07

之前是 data(fluidigm)  即可加载 Pollen et al. 2014 数据集,现在是 使用函数ReprocessedFluidigmData()来下载数据集:

library(scRNAseq)fluidigm <- ReprocessedFluidigmData()fluidigm


  • ReprocessedFluidigmData() provides 65 cells from Pollen et al. (2014).
  • ReprocessedTh2Data() provides 96 T helper cells from Mahata et al. (2014).
  • ReprocessedAllenData() provides 379 cells from the mouse visual cortex, which is a subset of the data from Tasic et al. (2016).




  • inst/scripts/make-X-Y-data.Rmd, a Rmarkdown report that creates all components of a SingleCellExperiment. X should be the last name of the first author of the relevant study while Y should be the name of the biological system.
  • inst/scripts/make-X-Y-metadata.R, an R script that creates a metadata CSV file at inst/extdata/metadata-X-Y.csv. Metadata files should follow the format described in the ExperimentHub documentation.
  • R/XYData.R, an R source file that defines a function XYData() to download the components from ExperimentHub and creates a SingleCellExperiment object
