R Workflow for Family-Level Taxonomic Dendrograms (Benakis et al. Figure 1C Style)

A ready-to-run, clean R script that builds a family-level taxonomic dendrogram filtered by abundance (> 1%), following the same logic as Benakis et al. (Figure 1C).
A Methods section formatted for inclusion in a manuscript, describing the same procedure in scientific writing style.

1. Ready-to-Run R Script

(Taxonomic dendrogram of microbial families filtered by relative abundance > 1%)

###############################################################################
# Taxonomic dendrogram of microbial families (abundance > 1%)
# Based on the Benakis et al. Figure 1C approach
# Author: [Your Name]
# Date: [Insert Date]
###############################################################################

# Load libraries
library(phyloseq)
library(dplyr)
library(tibble)
library(igraph)
library(ggraph)

# ------------------------------
# Step 1 — Agglomerate to Family level
# ------------------------------
# Assumes an existing phyloseq object: ps_filt

ps_fam <- tax_glom(ps_filt, taxrank = "Family", NArm = TRUE)
ps_fam <- prune_taxa(taxa_sums(ps_fam) > 0, ps_fam)

# Optional: restrict to selected groups if needed
# ps_fam <- subset_samples(ps_fam, Group %in% c("pre-FMT", "9", "10", "11"))

# ------------------------------
# Step 2 — Calculate relative abundance and filter families >1%
# ------------------------------

# Transform to relative abundance
ps_fam_relab <- transform_sample_counts(ps_fam, function(x) x / sum(x))

# Compute mean relative abundance across all samples
mean_abund <- taxa_sums(ps_fam_relab) / nsamples(ps_fam_relab)

# Keep only families with >1% mean relative abundance
keep_taxa <- names(mean_abund[mean_abund > 0.01])
ps_fam_top <- prune_taxa(keep_taxa, ps_fam_relab)

# ------------------------------
# Step 3 — Build taxonomic hierarchy (Phylum → Class → Order → Family)
# ------------------------------

tax <- as.data.frame(tax_table(ps_fam_top)) %>%
  rownames_to_column("taxa_id") %>%
  select(Phylum, Class, Order, Family) %>%
  mutate(across(everything(), as.character)) %>%
  distinct()

edges <- bind_rows(
  tax %>% transmute(from = Phylum, to = Class),
  tax %>% transmute(from = Class,  to = Order),
  tax %>% transmute(from = Order,  to = Family)
) %>%
  distinct() %>%
  filter(!is.na(from), !is.na(to), from != "", to != "")

# ------------------------------
# Step 4 — Plot dendrogram-style taxonomy tree
# ------------------------------

g <- graph_from_data_frame(edges, directed = TRUE)

p <- ggraph(g, layout = "dendrogram") +
  geom_edge_diagonal(color = "grey60") +
  geom_node_point(size = 2, color = "steelblue") +
  geom_node_text(aes(label = name), hjust = -0.1, size = 3) +
  theme_void() +
  coord_flip() +
  ggtitle("Taxonomic dendrogram of microbial families (>1% abundance)")

print(p)

# ------------------------------
# Optional: collapse remaining rare families as "Other"
# ------------------------------
# This can be done before filtering or directly in the figure annotation
###############################################################################

2. Methods Section (for Manuscript)

Taxonomic Dendrogram Construction

A taxonomic dendrogram illustrating the microbial composition at the family level was generated in R (version 4.3.0) using the phyloseq and ggraph packages. The analysis followed an approach similar to that presented by Benakis et al. (2020, Cell Host \& Microbe, Figure 1C).

First, the filtered amplicon sequence variant (ASV) table was agglomerated to the Family rank using the tax_glom() function from phyloseq. The dataset was then transformed to relative abundances for each sample, and families with a mean relative abundance greater than 1% across all samples were retained for visualization. Families below this threshold were excluded and considered part of the “Other” category for interpretability.

A hierarchical taxonomy table (Phylum → Class → Order → Family) was extracted, and edges representing parent–child relationships were assembled using dplyr. The resulting taxonomic relationships were converted into a directed graph structure using igraph, and visualized as a dendrogram with ggraph (layout = "dendrogram").

This approach provides a qualitative overview of microbial family-level diversity while reducing visual noise from rare taxa, consistent with the display strategy used in the original publication.

Would you like me to adapt the Methods version to a more formal journal submission style (e.g., third person passive voice and APA-compliant citation format)?

Generating a Taxonomic Dendrogram of Significant Microbial Families in R (Based on Benakis et al. Figure 1C)

Concept overview

Yes—this is doable in R, and you can restrict the dendrogram to only significant microbial families. Based on the structure in the Benakis et al. paper:

Figure 1C is a taxonomic dendrogram (Phylum → Class → Order → Family), not a clustering of samples.
The authors filtered families by frequency > 1%, not by statistical significance.

If your PI asks for “only significantly expressed families,” you first need to perform a differential abundance test at the Family level (e.g., using ANCOM-BC2 or DESeq2) before plotting only those families or highlighting them on the tree.

Recommended Workflow

Step 1 — Agglomerate to Family level (phyloseq)

library(phyloseq)

ps_fam <- tax_glom(ps_filt, taxrank = "Family", NArm = TRUE)
ps_fam <- prune_taxa(taxa_sums(ps_fam) > 0, ps_fam)

# Restrict to requested sample groups
ps_fam_sub <- subset_samples(ps_fam, Group %in% c("9", "10", "11", "pre-FMT"))
sample_data(ps_fam_sub)$Group <- factor(sample_data(ps_fam_sub)$Group,
                                       levels = c("pre-FMT", "9", "10", "11"))

Step 2 — Differential abundance to define significant families

Option A: ANCOM-BC2 (recommended for microbiome data)

library(ANCOMBC)

an2 <- ancombc2(data = ps_fam_sub,
                formula = "Group",
                p_adj_method = "BH",
                prv_cut = 0, lib_cut = 0,
                group = "Group",
                global = FALSE)

res <- an2$res
# Inspect results: names(an2$res), matrices such as res$diff_abn and res$q_val

Option B: DESeq2 (alternative)

library(DESeq2)

dds <- phyloseq_to_deseq2(ps_fam_sub, ~ Group)
dds <- DESeq(dds)

# Example contrast: Group 9 vs 10
r_9_10 <- results(dds, contrast = c("Group", "9", "10"))
sig_9_10 <- rownames(r_9_10)[which(r_9_10$padj < 0.05)]
sig_9_10 <- sig_9_10[!is.na(sig_9_10)]

Decide what “significant” means for your plot:

Only families significant for 9 vs 10, or
Families significant in any pairwise comparison among (9, 10, 11, pre-FMT).

Step 3 — Build the taxonomic dendrogram (Phylum → Class → Order → Family)

library(dplyr)
library(tibble)
library(igraph)
library(ggraph)

# Choose your significant taxa list (example from DESeq2)
sig_taxa <- sig_9_10

# Get taxonomy for significant families
tax <- as.data.frame(tax_table(ps_fam_sub)) %>%
  rownames_to_column("taxa_id") %>%
  filter(taxa_id %in% sig_taxa) %>%
  select(Phylum, Class, Order, Family) %>%
  mutate(across(everything(), as.character)) %>%
  distinct()

# Build hierarchical edges
edges <- bind_rows(
  tax %>% transmute(from = Phylum, to = Class),
  tax %>% transmute(from = Class,  to = Order),
  tax %>% transmute(from = Order,  to = Family)
) %>%
  distinct() %>%
  filter(!is.na(from), !is.na(to), from != "", to != "")

# Build graph
g <- graph_from_data_frame(edges, directed = TRUE)

# Plot dendrogram
p <- ggraph(g, layout = "dendrogram") +
  geom_edge_diagonal() +
  geom_node_point(size = 2) +
  geom_node_text(aes(label = name), hjust = -0.1, size = 3) +
  theme_void() +
  coord_flip()

p

Highlighting significance in the plot

To color families by direction (e.g., enriched in Group 9 vs 10), merge the log2FoldChange results from DESeq2 with the node names and use scale_color_gradient2() or similar in ggraph to map significance.

Summary for your report (Q1.5 suggested answer)

Yes—this can be done in R. The template figure in the Benakis paper shows a taxonomy-based dendrogram (Phylum → Class → Order → Family). In that study, families were filtered by abundance (only > 1% frequency) rather than statistical significance.

For our dataset, we can (i) construct the full family-level taxonomy tree and (ii) optionally restrict or highlight only those families that are significantly different between groups based on differential abundance testing (using DESeq2 or ANCOM-BC2 with Benjamini–Hochberg-adjusted p-values).

If you specify which comparison defines “significant” (only 9 vs 10 or across all four groups), the exact filtering and visualization code can be tailored to more closely match Figure 1C, including the “Other” bin and legend style.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Microbial bioinformatics

Microbial bioinformatics uses computational tools to analyze genomes, track evolution, and study functions in microorganisms, including bacteria and viruses.