100篇泛癌研究文献解读之肿瘤病人新的分类方法
为了分析不同类型、组织起源肿瘤的共性、差异以及新课题。TCGA于2012年10月26日-27日在圣克鲁兹,加州举行的会议中发起了泛癌计划。参考:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6000284/ 为此我也录制了系列视频教程在:TCGA知识图谱视频教程(B站和YouTube直达)
前面我们提到过一个发表在CNS正刊的泛癌研究, Nature. 2013 Oct , 本研究也是如此,发表于 Cell. 2014 Aug , 文章是: Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. TCGA的pan-cancer数据挖掘能发CNS正刊也算是很不容易了,本研究主要是基于表达量的分类,希望可以搞清楚之前常用的组织器官分类方法在现在的多组学领域的实用性。
涉及的病人和癌症是 3,527 specimens from 12 cancer types, 包括 6种类型数据:five genome-wide platforms and one proteomic platform 就是我多次在TCGA课程提到的:
whole-exome DNA sequence (Illumina HiSeq and GAII)
DNA copy-number variation (Affymetrix 6.0 microarrays)
DNA methylation (Illumina 450,000-feature microarrays)
genome-wide mRNA levels (Illumina mRNA-seq)
microRNA levels (Illumina microRNA-seq)
protein levels for 131 proteins and/or phosphorylated proteins (Reverse Phase Protein Arrays; RPPA)
文献解读属于100篇泛癌研究文献系列,首发于:http://www.bio-info-trainee.com/4132.html
整合多组学数据分类
研究者提出的 分类算法是:Cluster-Of-Cluster-Assignments (COCA)
数据整合的前提是对多种数据的认知:
Copy number data were summarized at the gene level using GISTIC 2.0 and t-tests for every gene were performed for each COCA subtype.
DNA Methylation probes were associated with any gene that fell in the +/1500bp region surrounding gene transcriptional start sites.
Genes with differential mRNA expression were identified using a SAM analysis on the RSEM values.
Genes with differentially expressed protein products were determined by running a t-test on the 131 protein forms represented on the RPPA data.
Differentially expressed miRNAs for each COCA subtype were identified using a Wilcoxon rank-sum test based on the miRNA-Seq data.
最后的分类如下:
文章重点讨论的是:
COCA1 – LUAD-enriched
COCA2-Squamous
COCA3-BRCA/Luminal
COCA4-BRCA/Basal-like
COCA7-COAD/READ
COCA8-BLCA
完整的13个分类图如下:
而且这个分类的生存分析是显著的:
PARADIGM分析
来源于文献 (2010). Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 关于这个算法的介绍非常少。
本文出现的主要是 PARADIGM pathway networks 和PARADIGM SuperPathway , 如下所示:
HotNet分析
跟前面提到的PARADIGM类似,首先是HotNet,然后是HotNet2,最后是 Hierarchical HotNet ,这是一个有历史的算法:
Hierarchical HotNet: identifying hierarchies of altered subnetworks …
https://academic.oup.com/bioinformatics/article/34/17/i972/5093236
后记
本研究毕竟发表于CELL正刊,真的是很难读,不过图表的确很漂亮,问题是,得出的结论实用性不强,只不过是秀了一下他们的SCI文章功底。
当然了,如果你想超脱于他们的泛癌计划已经发表的研究,那么就非常有必要跟着我读完这100篇泛癌文献!
详见我的100篇泛癌研究文献解读目录:http://www.bio-info-trainee.com/4132.html
TCGA教程长期更新列表
TCGA的28篇教程-使用R语言的cgdsr包获取TCGA数据(cBioPortal)
TCGA的28篇教程-使用R语言的RTCGA包获取TCGA数据 (离线打包版本)
TCGA的28篇教程-使用R语言的RTCGAToolbox包获取TCGA数据 (FireBrowse portal)
TCGA的28篇教程-批量下载TCGA所有数据 ( UCSC的 XENA)