Seurat 4.0 ||您的单细胞数据分析工具箱上新啦
男,
一个长大了才会遇到的帅哥,
稳健,潇洒,大方,靠谱。
一段生信缘,一棵技能树。
生信技能树核心成员,单细胞天地特约撰稿人,简书创作者,单细胞数据科学家。
随着单细胞技术的成熟,同一细胞内的信息越来越多被揭晓。在转录组时代,我们说单细胞是一个rna的盒子,细胞类型是基因特异性表达的结果。现在,我们可以说单细胞是中心法则的反应器,DNA,rna,atac,膜蛋白,等等等等,都在细胞中生成与反应。
为了顺应(某种意义上也是推动)单细胞技术的发展,最大限度地打开我们的视野。单细胞数据分析工具箱Seurat,更新到了4.0版本(Beta版)。
Integrative multimodal analysis.(为单细胞多模态分析提供新算法)
从同一细胞同时测量多种数据类型的能力,称为多模态分析,代表了单细胞基因组学的一个新的和令人兴奋的前沿。在Seurat v4中,我们引入了加权最近邻(WNN)分析,这是一种无监督策略,用于学习每个细胞中每个模态的信息内容,并基于两种模态的加权组合来定义细胞状态。在我们新的预印本中,我们生成了一个CITE-seq数据集,其中包含转录组和228种表面蛋白的配对测量,并利用WNN定义了人类PBMC的多模态参考基。您可以使用WNN分析来自各种技术的多模态数据,包括CITE-seq、ASAP-seq、10X Genomics ATAC + RNA和SHARE-seq。
Preprint: Integrated analysis of multimodal single-cell data(https://satijalab.org/v4preprint) Vignette: Multimodal clustering of a human bone marrow CITE-seq dataset(https://satijalab.org/seurat/v4.0/weighted_nearest_neighbor_analysis.html) Portal: Click here(https://atlas.fredhutch.org/nygc/multimodal-pbmc/) Dataset: Download here(https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat)
Schematic overview of multimodal integration using
W
eightedN
earestN
eighbor analysis(WNN
) (A, B) Independent analysis of transcriptome (A) and protein (B) modalities from a CITE-seq dataset of cord blood mononuclear cells. Blue dot marks the same target cell in (A) and (B). Red dots denote the k=20 nearest neighbors to the target cell based on the transcriptome (A) or protein (B) modalities. (C) The RNA neighbors are averaged together to predict the molecular contents of the target cell, which can be compared to the actual measurements. Since the RNA neighbors represent a mixture of different T cell subsets, there is substantial error between predicted and measured protein expression levels for CD4 and CD8. (D) Same as in (C), but averaging protein neighbors. Since protein neighbors are all CD8 T cells, the predicted values are close to the actual measurements. We can therefore infer that for this target cell, the protein data is most useful for defining cell state, and assign it a higher protein modality weight. As described in Supplementary Methods, we perform the prediction and comparison steps in low-dimensional space. (E) We can integrate the modalities by constructing a Weighted Nearest Neighbor (WNN) graph, based on a weighted average of protein and RNA similarities. UMAP visualization and clustering of this graph. (F) Median RNA and protein modality weights for all cell types in the dataset. Modality weights were calculated for each cell without knowledge of cell type labels.
Rapid mapping of query datasets to references.(高质量的参考数据集与在线版本上线)
我们提供了Azimuth,一个利用高质量参考数据集快速映射新的scRNA-seq数据集(查询)的工作流。例如,您可以将人类PBMC的任何scRNA-seq数据集映射到我们的references上,从而自动化可视化、聚类注释和差异表达的过程。Azimuth可以在Seurat内运行,也可以使用不需要安装或编程经验的独立web应用程序运行。
Vignette: Mapping scRNA-seq queries onto reference datasets(https://satijalab.org/seurat/v4.0/reference_mapping.html) Web app: Automated mapping, visualization, and annotation of scRNA-seq datasets from human PBMC(https://satijalab.org/azimuth/)
速度与继承
速度和可用性更新:我们在v4中做了一些小的修改,主要是为了提高Seurat v4在大型数据集上的性能。这些更改极大地提高了速度和内存需求,但不会对下游结果造成不利影响。我们在这里提供了关键更改的详细描述。希望完全重现现有结果的用户可以通过继续安装Seurat v3继续这样做。
我们相信,熟悉Seurat v3的用户应该能够平稳地过渡到Seurat v4。虽然我们引入了大量的新功能,但现有的工作流、函数和语法在这次更新中基本没有变化。此外,以前在Seurat v3中生成的Seurat对象可以无缝地装载到Seurat v4中以进行进一步分析。
参考:
https://satijalab.org/seurat/