去broad官网下载msigdb数据库文件很麻烦 / 四六文摘

我在：借鉴escape包的一些可视化GSVA或者ssGSEA结果矩阵的方法和对单细胞表达矩阵做gsea分析的两个教程里面提到过，MSigDB（Molecular Signatures Database）数据库中定义了已知的基因集合：http://software.broadinstitute.org/gsea/msigdb 需要注册才能下载。

但是这个GitHub包，ncborcherding/escape文档，在：http://www.bioconductor.org/packages/release/bioc/vignettes/escape/inst/doc/vignette.html 提供了一个封装好的MSigDB数据库信息，其实你仔细看它的文档，它的打包其实是依赖于msigdbr_7.2.1。

获取 MigDB中的全部基因集

MigDB中的全部基因集都被这个GitHub包，ncborcherding/escape 打包起来了，MSigDB（Molecular Signatures Database）数据库中定义了已知的基因集合：http://software.broadinstitute.org/gsea/msigdb 包括H和C1-C7八个系列（Collection），每个系列分别是：

H: hallmark gene sets （癌症）特征基因集合，共50组，最常用；
C1: positional gene sets 位置基因集合，根据染色体位置，共326个，用的很少；
C2: curated gene sets：（专家）校验基因集合，基于通路、文献等：
C3: motif gene sets：模式基因集合，主要包括microRNA和转录因子靶基因两部分
C4: computational gene sets：计算基因集合，通过挖掘癌症相关芯片数据定义的基因集合；
C5: GO gene sets：Gene Ontology 基因本体论，包括BP（生物学过程biological process，细胞原件cellular component和分子功能molecular function三部分）
C6: oncogenic signatures：癌症特征基因集合，大部分来源于NCBI GEO 发表芯片数据
C7: immunologic signatures: 免疫相关基因集合。

GS <- getGeneSets(library = "H") GS

MigDB中的全部基因集被构建成为：a list of GSEABase GeneSet objects ，获取 hallmark gene sets （癌症）特征基因集合。

源头是msigdbr 包

安装方法非常简单：

install.packages("msigdbr")

但是这个msigdbr并没有我想象中的那么大：

Installing package into 'C:/Users/win10/Documents/R/win-library/4.0’ (as 'lib’ is unspecified) 试开URL’https://cran.rstudio.com/bin/windows/contrib/4.0/msigdbr_7.2.1.zip' Content type 'application/zip' length 6737651 bytes (6.4 MB) downloaded 6.4 MB

package 'msigdbr’ successfully unpacked and MD5 sums checked

同样的，学习R包，看看文档即可，在：https://cran.r-project.org/web/packages/msigdbr/vignettes/msigdbr-intro.html

Documentation for package 'msigdbr’ version 7.2.1 DESCRIPTION file. User guides, package vignettes and other documentation. Help Pages msigdbr Retrieve the gene sets data frame msigdbr_collections List the collections available in the msigdbr package msigdbr_show_species List the species available in the msigdbr package msigdbr_species List the species available in the msigdbr package

非常简单的文档

这些代码使用就明白了，确实没啥好继续讲解的：

library(msigdbr) # All gene sets in the database can be retrieved without specifying a collection/category. all_gene_sets = msigdbr(species = "Mus musculus") head(all_gene_sets) msigdbr_species() all_gene_sets = msigdbr(species = "Homo sapiens")

无非就是封装和对象，前面的 escape 包提供了getGeneSets函数，我们的这个msigdbr提供了 msigdbr函数。

生信基石之R语言

B站的10个小时教学视频务必看完，参考 GitHub 仓库存放的相关学习路线指导资料：https://github.com/jmzeng1314/R_bilibili ，可以参考一些优秀笔记，比如https://mubu.com/doc/2KUiSCfVsg

初级10 个题目：http://www.bio-info-trainee.com/3793.html
中级要求是：http://www.bio-info-trainee.com/3750.html
高级要求是完成20题：http://www.bio-info-trainee.com/3415.html
统计专题 30题：http://www.bio-info-trainee.com/4385.html
可视化专题30题：http://www.bio-info-trainee.com/4387.html

去broad官网下载msigdb数据库文件很麻烦

获取 MigDB中的全部基因集

源头是msigdbr 包

生信基石之R语言

文末友情推荐

相关推荐