把别人nature发表的数据拿回来重新分析发science
把别人nature发表的数据拿回来重新分析发science,好吧,我承认我标题党了。
文章是:Single-cell SNP analyses and interpretations based on RNA-Seq data for colon cancer research 于2016年发表于 scientific report杂志,使用的公共数据来自于:Wu, A. R. et al.Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods 11, 41–46, doi: 10.1038/nmeth.2694 (2014).
下载公共数据,侧重于数据分析,而且非常详尽的描述了具体过程;
In this study, we have collected RNA-Seq data sets from 96 single cells, 4 bulk samples of HCT116 cancer cells and 1 bulk normal sigmoid colon sample.
First, we used the single-cell RNA-Seq data to call SNPs using three SNP callers, analyzed the evolutional stress on Gene Ontology (GO) Slim terms, and compared the profiles of SNPs, which were enriched on chr11 and chr17, among the 83 selected single-cells.
Second, by implementing GO analysis, SNP enrichments were shown in several GO Slim terms such as signal.transduction, while obvious cell heterogeneities were also observed.
Third, we selected 175 cancer-related genes curated from previous studies and we detected that the SNPs were enriched in some of these genes in cancer-related pathways, even though not all of them were consistently identified. In colon cancer-related pathways such as the TGF-β and p53 signaling pathways, we found a list of mutated genes, some of which showed SNP enrichments. We speculated that these cancer-related genes and pathways might play key roles in the occurrence and metastasis of colon cancer.
Finally, to examine the differences in the identified SNPs based on single-cell and bulk samples, we performed SNP analyses on RNA-Seq data using bulk cancer and normal samples. By comparing these results at the single-cell and bulk levels, it was clear that single-cell analyses were not only capable of recapitulating the bulk analyses results such as SNP profiles, cancer-related genes and pathways, but also specialized in detecting some variances and genetic features such as single-cell specific variations in BMP7, CYCS and some 14-3-3 protein-encoding genes in the subpopulations of single cells. Additionally, we searched for fusion genes in the single-cell sets and found specific transcripts that might have potential to accelerate tumor progression. Together, these comparisons revealed the globally consistent but locally different cell-to-cell and cell-to-bulk SNP variations.
数据下载地址:
For the bulk colon cancer samples, we downloaded RNA-Seq data for 4 bulk cancer samplesfrom the HCT116 cell line from NCBI (BioProject ID: 222225)
RNA-Seq data for 96 single HCT116 cells were downloaded from National Center for Biotechnology Information (NCBI) for colon cancer SNP analysis.
For the bulk normal samples, the RNA-Seq data (GSM1010974) were downloaded from the Reference Epigenome Mapping Center.
数据处理流程:
In SNP calling step, we called SNPs using three SNP callers, the GATK, SAMTools and GeMS for the sake of higher accuracy and sensitivity since no specific SNP calling algorithm has been customized for single-cell RNA-Seq data.
Parallel-QC —》 STAR —》
点击可以加入单细胞数据处理学习交流小组