一个分析一个图,后面全靠编故事
最近看到一个韩国人的TCGA数据挖掘文章,标题是:《Classification of Genes Based on Age-Related Differential Expression in Breast Cancer》,发表于2017年,研究者根据年龄对病人进行分组,使用TCGA数据库里面的乳腺癌患者数据,病人分成 3 组:
Young patients were defined as ≤45 years of age elderly patients were defined as those ≥60 years of age The rest of the patients were defined as “intermediate.”
全部的分析流程如下所示:
A total of 5,962 genes in class A were defined as significant DEGs in breast cancer, and 13,684 in class B were nonsignificant. Ones who want to find biomarkers or driver genes are likely to investigate only genes in class A. However, we classified the genes of each class once again into eight groups, based on the pattern of p-values, which were calculated separately for every age group (secondary classification in Fig. 2). After a second round of classification, the genes were eventually divided into 16 classes (A1–B8) (Supplementary Table 1).
差异分析把基因分成了两类,差异分析相信大家都不陌生了,基本上看我六年前的表达芯片的公共数据库挖掘系列推文即可;
解读GEO数据存放规律及下载,一文就够 解读SRA数据库规律一文就够 从GEO数据库下载得到表达矩阵 一文就够 GSEA分析一文就够(单机版+R语言版) 根据分组信息做差异分析- 这个一文不够的 差异分析得到的结果注释一文就够
然后居然对基因的p值进行趋势分类,我勒个去,让人尴尬的脑阔疼!
作者的研究结果居然是把基因分成了16组,然后每个组有自己的生物学功能数据库注释结果,其中几个组可以把癌症样品和正常对照区分开来!
说这个是灌水,都是侮辱了灌水这两个字!
文末友情推荐
与十万人一起学生信,你值得拥有下面的学习班:
赞 (0)