ampvis2 一个用于分析和可视化16S rRNA扩增子数据的R包 / 四六文摘

ampvis2: an R package to analyse and visualise 16S rRNA amplicon data

View ORCID ProfileKasper S. Andersen, View ORCID ProfileRasmus H. Kirkegaard, View ORCID ProfileSøren M. Karst, Mads Albertsen

doi: https://doi.org/10.1101/299537

使用16S rRNA基因扩增子测序的微生物群落分析是许多微生物生态学研究的支柱。目前有许多方法和流程通过16S rRNA测序产生的原始数据并将数据转换成OTU表。在这里，我们介绍了ampvis2，一个R包，用于分析OTU表格式的微生物群落数据，特点是是简单性，可重复性和样本分组数据封装，类似于phyloseq，以及构造了最直观的命令。拥有一些独特的功能：包括灵活的热图和简化的排序。通过使用ggplot2包生成图表，ampvis2可以生成可以轻松定制的出版物图表。此外，ampvis2包括用于交互式可视化的功能，可以方便处理更大，更复杂的数据。

Summary Microbial community analysis using 16S rRNA gene amplicon sequencing is the backbone of many microbial ecology studies. Several approaches and pipelines exist for processing the raw data generated through DNA sequencing and convert the data into OTU-tables. Here we present ampvis2, an R package designed for analysis of microbial community data in OTU-table format with focus on simplicity, reproducibility, and sample metadata integration, with a minimal set of intuitive commands. Unique features include flexible heatmaps and simplified ordination. By generating plots using the ggplot2 package, ampvis2 produces publication-ready figures that can be easily customised. Furthermore, ampvis2 includes features for interactive visualisation, which can be convenient for larger, more complex data.

ampvis2 架构：

github 地址：

https://github.com/MadsAlbertsen/ampvis2

https://madsalbertsen.github.io/ampvis2/articles/ampvis2.html

安装载入包

# install.packages("remotes") # remotes::install_github("MadsAlbertsen/ampvis2") library("ampvis2")

网站上提供了两份示例数据

地址：https://figshare.com/articles/Minimal_example_data_for_ampvis2/6139352

myotutable <- read.csv("example_otutable.csv", check.names = FALSE) mymetadata <- read.csv("example_metadata.csv", check.names = FALSE)

对数据结构进行了解和过滤

### 数据结构 library(ampvis2) d <- amp_load(otutable = myotutable, metadata = mymetadata) d data("MiDAS") MiDAS MiDAS$refseq## 数据过滤 MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East")) MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East") & !SampleID %in% c("16SAMP-749"), minreads = 10000) MiDAS_Chloroflexi_Actinobacteria <- amp_subset_taxa(MiDAS, tax_vector=c("p__Chloroflexi", "p__Actinobacteria"))

尝试作者推荐的热图

作者有使用ggplot热图，并且提供组合式热图出图，是不是眼前一亮。

## 试试作者特别推荐的热图 # Load example data data("AalborgWWTPs")


# 类似于phyloseq的封装方式

amp_heatmap(AalborgWWTPs, group_by = "Plant")

# Heatmap of 20 most abundant Genera (by mean) grouped by WWTP, split by Year, # values not plotted for visibility, phylum name added and colorscale adjusted manually amp_heatmap(AalborgWWTPs, group_by = "Plant", facet_by = "Year", plot_values = FALSE, tax_show = 20, tax_aggregate = "Genus", tax_add = "Phylum", color_vector = c("white", "red"), plot_colorscale = "sqrt", plot_legendbreaks = c(1, 5, 10) )

# Heatmap with known functional information about the Genera shown to the right # By default this information is retrieved directly from midasfieldguide.org # but you can provide your own with the function_data argument as shown in the # textmap amp_heatmap(AalborgWWTPs, group_by = "Plant", tax_aggregate = "Genus", plot_functions = TRUE, functions = c("PAO", "GAO", "AOB", "NOB") )

# A raw text version of the heatmap can be printed or saved as a data frame with textmap = TRUE. # Notice the function_data is now retrieved from the MiF data frame textmap <- amp_heatmap(AalborgWWTPs, group_by = "Plant", tax_aggregate = "Genus", plot_functions = TRUE, function_data = MiF, functions = c("PAO", "GAO", "AOB", "NOB"), textmap = TRUE ) textmap

作者的箱线图

## 箱线图 amp_boxplot(MiDASsubset)

amp_boxplot(MiDASsubset, group_by = "Period", tax_show = 5, tax_add = "Phylum")

##作者重点推出的排序 amp_ordinate(MiDASsubset, type = "pcoa", distmeasure = "bray", sample_color_by = "Plant", sample_colorframe = TRUE, sample_colorframe_label = "Plant") + theme(legend.position = "blank")

amp_ordinate(MiDASsubset, type = "pcoa", distmeasure = "bray", sample_color_by = "Plant", sample_colorframe_label = "Plant", sample_trajectory = "Date", sample_trajectory_group = "Plant")

ordinationresult <- amp_ordinate(MiDASsubset, type = "CCA", constrain = "Period", transform = "Hellinger", sample_color_by = "Period", sample_shape_by = "Plant", sample_colorframe = TRUE, sample_colorframe_label = "Period", detailed_output = TRUE) ordinationresult$plot

ampvis2 一个用于分析和可视化16S rRNA扩增子数据的R包