四个公共scRNA-seq来测试算法
四个公共scRNA-seq来测试算法
单细胞测序技术是2013年被Nature Methods选为“年度方法”的新技术。scRNA-seq技术越来越火热,仅仅是2015~2017就发表了一千多篇相关文章。可以挑选四个公共scRNA-seq:Patel数据集,Klein数据集,Zeisel数据集和Usoskin数据集,来测试单细胞转录组的各种数据处理算法。
Patel scRNA-seq dataset.
To characterize intra-tumoral heterogeneity and redundant transcriptional pattern in glioblastoma tumors, Patel et al. efficiently profiled 5,948 expressed genes of 430 cells from five dissociated human glioblastomas using the SMART-Seq protocol.The filtered and centered-normalized data along with the corresponding cell labels were downloaded from https://hemberg-lab.github.io/scRNA.seq.datasets/. As described in this study, we report clustering into five clusters corresponding to the five different dissociated tumors from which cells were extracted. We did not perform any other normalization or gene selection on this dataset
Klein scRNA-seq dataset.
Klein et al.characterized the transcriptome of 2,717 cells (Mouse Embryonic Stem Cells, mESCs), across four culture conditions (control and with 2, 4 or 7 days after leukemia inhibitoryfactor, LIF, withdrawal) using InDrop sequencing. Gene expression was quantified with Unique Molecular Identifier(UMI) counts (essentially tags that identify individual molecules allowing removal of amplification bias). The raw UMI counts and cells label were downloaded from https://hemberg-lab.github.io/scRNA.seq.datasets/. After filtering out lowly expressed genes (10,322 genes remaining after removing genes that have less than 2 counts in 130 cells) and Count Per Million normalization (CPM) to reduce cell-to-cell variation in sequencing, we report clustering into four cell sub-populations, corresponding to the four culture conditions.
Zeisel scRNA-seq dataset.
Zeisel et al. collected 3,005 mouse cells from the primary somatosensory cortex (S1) and the hippocampal CA1 region, using the Fluidigm C1 microfluidics cell capture platform followed. Gene expression was quantified with UMI counts. The raw UMI counts and metadata (batch, sex, labels) were downloaded from http://linnarssonlab.org/cortex. We applied low expressed gene filtering (7,364 remaining genes after removing genes that have less than 2 counts in 30 cells) and CPM normalization. We report clustering into the nine major classes identified in the study.
Usoskin scRNA-seq dataset.
Uzoskin et al. collected 622 cells from the mouse dorsal root ganglion, using a robotic cell-picking setup and sequenced with a 5’ single- cell tagged reverse transcription (STRT) method. Filtered (9,195 genes) and normalized data (expressed as Reads Per Million) were downloaded with full sample annotations from http://linnarssonlab.org/drg. We report cluster- ing into four neuronal cell types.