学完TCGA教程,复现SCI文章生存曲线图(优秀学徒笔记分享)
分析需求:找到TINAGL1基因在TCGA数据库中乳腺癌数据的表达量分组看其是否影响生存
1. 下载TINAGL1在TCGA中按表达量分组的生存情况
网站介绍:OncoLnc该网站整合了TCGA的各种RNA数据和患者临床数据,提供生存分析的数据和图表
登陆网站,输入目标基因以及按目标基因的高低表达的分组的百分比
输入目标基因
选择BRCA数据集,点击“yes please"
输入50,50,则所有的TINAGL1的表达量按50%,50%分成高低两组。
点击click here,可以得到高低TINAGL1表达分组的Brca患者的生存情况文档。命名为“BRCA_64129_50_50.csv”
2. 用R语言复现在的高低TINAGL1分组的Brca患者中的生存情况
代码参考https://github.com/jmzeng1314/tcga_example
在R中,有个包survival做生存分析就很方便!只需要记住和熟练使用三个函数:
Surv:用于创建生存数据对象
survfit:创建KM生存曲线或是Cox调整生存曲线
survdiff:用于不同组的统计检验
rm (list = ls()) # 清空环境变量
options(stringsAsFactors = F)
a <- read.table('BRCA_64129_50_50.csv', sep = ',', fill = T, header = T)
library(ggplot2)
library(survival)
library(survminer)
table(da$Status)
# Alive Dead
871 135
da <- a
da$Status <- ifelse(da$Status == "Dead", 1, 0)
survf <- survfit(Surv(Days,Status)~Group, data=da)
ggsurvplot(survf, conf.int = F, pval = T)
叮,出图!和网站上分析出的结果一致!
3. 用R语言分析将乳腺癌分成亚型之后的生存情况
背景知识介绍:
使用DNA微阵列技术描绘乳腺癌的特性已供开发乳腺癌的基因表达谱分类体系。
根据DNA 微阵列基因表达谱已经确定5个主要的乳腺癌亚型:ER阳性/HER2阴性(管状A与管状B亚型);ER阴性/HER2阴性(基底细胞样亚型);HER2阳性以及具有类似于正常乳腺组织特征的肿瘤。在回顾性分析中,这些基因表达亚型具有不同的无复发生存期和总生存期。
在cbioportal网站下载乳腺癌患者在不同分型的信息
选择一个样本数目相对较大的数据集。
如果连续选择多个数据集,会提示样本可能会重叠,一般一次选一个数据集进行分析即可。
选择plot按钮,选择临床信息,肿瘤亚型,选择下载数据,重命名保存为"plot (2).txt"
用R语言进行分析处理,以normal组亚型为例
b <- read.table('plot (2).txt', sep = '\t', fill = T, header = T) # txt与csv的读入方式,区别在于sep的参数不同
head(b) # 查看b的数据结构colnames(b) <- c('Patient', 'Subtype', 'Expression', 'mutation') #重命名b的数据表行名
b<span class="katex-html" aria-hidden="true" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;"><span class="strut" style="height:1em;vertical-align:-0.25em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;"><span class="mord mathit" style="margin-right:0.13889em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">P<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">a<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">t<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">i<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">e<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">n<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">t<span class="mord mathit" style="margin-right:0.05764em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">E<span class="mord mathit" style="margin-right:0.05764em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">E<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">s<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">u<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">b<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">s<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">t<span class="mord mathit" style="margin-right:0.02778em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">r<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">i<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">n<span class="mord mathit" style="margin-right:0.03588em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">g<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">bPatient,1,12)
tmp = merge(a, b, by = 'Patient') #将a和b两个数据表合并
head(tmp)
# Patient Days Status Expression.x Group Subtype Expression.y mutation
1 TCGA-3C-AAAU 4047 Alive 174.05 Low BRCA_LumA -0.7159
2 TCGA-3C-AALI 4005 Alive 243.61 Low BRCA_Her2 -0.6316
3 TCGA-3C-AALJ 1474 Alive 202.18 Low BRCA_LumB -0.6818
4 TCGA-3C-AALK 1448 Alive 716.59 High BRCA_LumA -0.0582
5 TCGA-4H-AAAK 348 Alive 469.79 Low BRCA_LumA -0.3574
6 TCGA-5L-AAT0 1477 Alive 621.79 High BRCA_LumA -0.1731
table(tmp不能识别此Latex公式:
Subtype)
# 0 1
26 6
dat = tmp [tmpSubtype == 'BRCA_Normal', ] # 选择目标亚型
library(ggplot2)
library(survival)
library(survminer)
table(datdat<span class="katex-html" aria-hidden="true" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;"><span class="strut" style="height:1em;vertical-align:-0.25em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;"><span class="mord mathit" style="margin-right:0.05764em;" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">S<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">t<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">a<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">t<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">u<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">s<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">d<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">a<span class="mord mathit" style="box-sizing: border-box;font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">tStatus <- ifelse(dat$Status == "Dead", 1, 0)
sfit <- survfit(Surv(Days,Status)~Group, data=dat)
sfit
summary(sfit)
ggsurvplot(sfit, conf.int = F, pval = T)
</span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit" style="margin-right:0.05764em;"></span class="strut" style="height:1em;vertical-align:-0.25em;"></span class="katex-html" aria-hidden="true"></span class="mord mathit"></span class="mord mathit" style="margin-right:0.03588em;"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit" style="margin-right:0.02778em;"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit" style="margin-right:0.05764em;"></span class="mord mathit" style="margin-right:0.05764em;"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit"></span class="mord mathit" style="margin-right:0.13889em;"></span class="strut" style="height:1em;vertical-align:-0.25em;"></span class="katex-html" aria-hidden="true">出图
其他几个亚型如法炮制, 只需要在R语言分析时,修改一下目标亚型的名称,即可得出
Brca_Basal
Brca_Her2
Brca_LumA
Brca_LumB
学习体会:
1、首先感谢Jimmy大大的教程和代码,十足的良心之作,只要跟着一步步学下来,肯定能复现漂亮的图
2、另外,感谢Jimmy大大在我学习过程中耐心的指导,Jimmy大大不仅编程能力了得,还有十分丰富的教学经验,能一下了解我遇到的代码问题在哪里,比我检索十几篇教程都有用。
3、当然,还是要学着自己检索,继续练习,提高生信数据挖掘的能力。实战真的比只看不练收获得多得多得多。
4、继续跟着Jimmy大大学习,争取以后做出更多好看的图图,和大家分享心得体会。
5、生信跟着Jimmy大大入门和进阶,肯定是个明智的选择。大家一起加油!
后记:
值得注意的是,各个数据库关于生存信息资料其实是有冲突的:TCGA数据库生存分析的网页工具哪家强