๐งฌ Circular Phylogenetic Tree Pipeline with Resistance Gene Heatmaps
Complete workflow for Figure S1 generation โ suitable for homepage documentation
๐ Pipeline Overview
WGS Reads (100 isolates)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Species Clusteringโ
โ • E. coli (n=24) โ
โ • K. pneumoniae (n=22)โ
โ • A. baumannii (n=30)โ
โ • P. aeruginosa (n=25)โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ 2. Genome Annotationโ
โ • Prokka v1.14.5 โ
โ • TIGRFAMs HMM DB โ
โ • Output: *.gff โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ 3. Pangenome Analysisโ
โ • Roary v3.13.0 โ
โ • Core gene alignmentโ
โ • Gene P/A matrix โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ 4. Resistance Gene โ
โ Detection โ
โ • Abricate + ResFinderโ
โ • 13 β-lactamase genesโ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ 5. Phylogenetic Treeโ
โ • RAxML-NG โ
โ • GTR+G model โ
โ • 1000 bootstraps โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ 6. Visualization โ
โ • ggtree (R) โ
โ • Circular layout โ
โ • ST-colored tips โ
โ • Gene heatmap ring โ
โโโโโโโโโโโโโโโโโโโโโโโ
๐ง Step-by-Step Commands
1๏ธโฃ Prokka Annotation (per species)
#!/bin/bash
# prokka_run.sh
SPECIES_CONFIG=(
"ecoli:Escherichia:coli"
"kpneumoniae:Klebsiella:pneumoniae"
"abaumannii:Acinetobacter:baumannii"
"paeruginosa:Pseudomonas:aeruginosa"
)
for config in "${SPECIES_CONFIG[@]}"; do
IFS=':' read -r SPECIES_KEY GENUS SPECIES_NAME <<< "$config"
for SAMPLE in $(cat samples_${SPECIES_KEY}.txt); do
prokka --force \
--outdir prokka/${SPECIES_KEY}/${SAMPLE} \
--cpus 8 \
--kingdom Bacteria \
--genus "${GENUS}" \
--species "${SPECIES_NAME}" \
--addgenes --addmrna \
--prefix "${SAMPLE}" \
--locustag "${SAMPLE}" \
--hmm /media/jhuang/Titisee/GAMOLA2/TIGRfam_db/TIGRFAMs_15.0_HMM.LIB \
assemblies/${SAMPLE}.fasta
done
done
2๏ธโฃ Roary Pangenome Analysis
#!/bin/bash
# roary_run.sh
# A. baumannii
roary -f roary/abaumannii -e --mafft -p 40 \
prokka/abaumannii/*/*.gff
# E. coli
roary -f roary/ecoli -e --mafft -p 40 \
prokka/ecoli/*/*.gff
# K. pneumoniae
roary -f roary/kpneumoniae -e --mafft -p 40 \
prokka/kpneumoniae/*/*.gff
# P. aeruginosa
roary -f roary/paeruginosa -e --mafft -p 40 \
prokka/paeruginosa/*/*.gff
3๏ธโฃ Resistance Gene Detection (Abricate + ResFinder)
#!/bin/bash
# abricate_run.sh
# Setup databases (one-time)
abricate --setupdb
# Define target genes
TARGET_GENES="blaCTX-M,blaIMP,blaKPC,blaNDM-1,blaNDM-5,blaOXA-23-like,blaOXA-24-like,blaOXA-48-like,blaOXA-58-like,blaPER-1,blaSHV,blaVEB-1,blaVIM"
for SPECIES in ecoli kpneumoniae abaumannii paeruginosa; do
for SAMPLE in $(cat samples_${SPECIES}.txt); do
# Run against ResFinder
abricate --db resfinder \
--minid 90 --mincov 80 \
assemblies/${SAMPLE}.fasta \
> abricate/${SPECIES}/${SAMPLE}.resfinder.tsv
# Extract target genes to CSV-ready format
awk -F'\t' -v sample="${SAMPLE}" '
NR==1 {next}
$10 ~ /blaCTX-M|blaIMP|blaKPC|blaNDM-1|blaNDM-5|blaOXA-23-like|blaOXA-24-like|blaOXA-48-like|blaOXA-58-like|blaPER-1|blaSHV|blaVEB-1|blaVIM/ {
gene=$10; gsub(/[^a-zA-Z0-9.-]/,"_",gene);
print sample"\t"gene"\t+"
}' abricate/${SPECIES}/${SAMPLE}.resfinder.tsv
done | pivot to wide format > isolate_${SPECIES}.csv
done
4๏ธโฃ Phylogenetic Tree Construction (RAxML-NG)
#!/bin/bash
# raxml_run.sh
for SPECIES in ecoli kpneumoniae abaumannii paeruginosa; do
raxml-ng --all \
--msa roary/${SPECIES}/core_gene_alignment.aln \
--model GTR+G \
--bs-trees 1000 \
--threads 40 \
--seed 12345 \
--prefix ${SPECIES}_core_gene_tree_1000
done
5๏ธโฃ SNP Distance Matrix (Optional for clonality assessment)
#!/bin/bash
# snp_analysis.sh
conda activate bengal3_ac3
for SPECIES in ecoli kpneumoniae abaumannii paeruginosa; do
# Extract SNPs from core alignment
snp-sites -v -o roary/${SPECIES}/core_snps.vcf \
roary/${SPECIES}/core_gene_alignment.aln
# Calculate pairwise SNP distances
snp-dists roary/${SPECIES}/core_gene_alignment.aln \
> results/${SPECIES}_snp_dist.tsv
done
# Convert to Excel for review
~/Tools/csv2xls-0.4/csv_to_xls.py \
results/*_snp_dist.tsv \
-d$'\t' -o results/snp_distances_all_species.xls
๐จ Improved R Code for Publication-Quality Figures
Addresses reviewer feedback: “barely legible” โ high-resolution, readable output
#!/usr/bin/env Rscript
# plotTreeHeatmap_improved.R
# Enhanced version for reviewer revision: larger fonts, better spacing, high-DPI export
suppressPackageStartupMessages({
library(ggtree)
library(ggplot2)
library(dplyr)
library(ape)
library(treeio)
})
# ===== USER CONFIGURATION =====
SPECIES <- "abaumannii" # Options: ecoli, kpneumoniae, abaumannii, paeruginosa
INPUT_DIR <- paste0("/home/jhuang/DATA/Data_Ben_Boruta_Analysis/plotTreeHeatmap_", SPECIES)
OUTPUT_DIR <- INPUT_DIR
TREE_FILE <- paste0("../", SPECIES, "_core_gene_tree_1000.raxml.bestTree")
INFO_FILE <- paste0("isolate_", SPECIES, "_.csv")
# High-resolution output settings
DPI <- 400 # ↑ from 200-300 for print quality
FIG_WIDTH <- 4000 # pixels (≈13.3 inch at 300 DPI)
FIG_HEIGHT <- 3600 # pixels (≈12 inch at 300 DPI)
# Font size multipliers (adjust based on number of tips)
FONT_SCALE <- ifelse(SPECIES %in% c("ecoli", "kpneumoniae"), 1.2, 1.0)
TIP_LABEL_SIZE <- 2.8 * FONT_SCALE # ↑ from 2.2
HEATMAP_FONT_SIZE <- 3.2 * FONT_SCALE # ↑ from 2.2
LEGEND_TEXT_SIZE <- 14 # ↑ from 8-10
LEGEND_TITLE_SIZE <- 16 # ↑ from 10-12
GENE_NAME_ANGLE <- 0 # Horizontal labels for readability
# Color schemes
ST_COLORS <- switch(SPECIES,
"ecoli" = c("10"="cornflowerblue","38"="darkgreen","46"="seagreen3","69"="tan",
"88"="red","131"="navyblue","156"="gold","167"="green",
"216"="orange","405"="pink","410"="purple","1882"="magenta",
"2450"="brown","2851"="darksalmon","3570"="chocolate4","4774"="darkkhaki"),
"kpneumoniae" = c("11"="cornflowerblue","13"="darkgreen","15"="seagreen3","43"="tan",
"54"="red","101"="navyblue","147"="gold","307"="green",
"323"="orange","377"="pink","383"="purple","395"="magenta","716"="brown"),
"abaumannii" = c("1"="cornflowerblue","2"="darkgreen","3"="seagreen3","10"="tan",
"25"="red","46"="navyblue","78"="gold","85"="green",
"158"="orange","164"="pink","318"="purple","492"="magenta","499"="brown"),
"paeruginosa" = c("111"="cornflowerblue","233"="darkgreen","235"="seagreen3","273"="tan",
"357"="red","644"="navyblue","664"="gold","773"="green",
"1242"="orange","1655"="pink","1684"="purple","-"="magenta")
)
HEATMAP_COLORS <- c("+" = "#1b7837", "-" = "#d9d9d9") # Colorblind-friendly palette
# ===== DATA LOADING =====
setwd(INPUT_DIR)
info <- read.csv(INFO_FILE, sep="\t", check.names = FALSE, stringsAsFactors = FALSE)
info$name <- info$Isolate
info$ST <- factor(info$ST) # Ensure discrete coloring
# Load tree
tree <- read.tree(TREE_FILE)
# Prepare heatmap data (resistance genes)
gene_cols <- c("Isolate", "blaCTX-M", "blaIMP", "blaKPC", "blaNDM-1", "blaNDM-5",
"blaOXA-23-like", "blaOXA-24-like", "blaOXA-48-like", "blaOXA-58-like",
"blaPER-1", "blaSHV", "blaVEB-1", "blaVIM")
heatmapData <- info %>%
select(all_of(gene_cols)) %>%
mutate(across(-Isolate, ~ifelse(. %in% c("+", "1", "TRUE", "Present"), "+", "-")))
rownames(heatmapData) <- heatmapData$Isolate
heatmapData$Isolate <- NULL
heatmapData[] <- lapply(heatmapData, factor, levels = c("+", "-")) # Preserve both levels
# ===== TREE CALCULATIONS =====
ht <- max(node.depth.edgelength(tree)) # Tree height for proportional spacing
# Base circular tree with enhanced readability
p <- ggtree(tree, layout = "circular", open.angle = 25, branch.length = "none") %<+% info +
geom_tippoint(aes(color = ST), size = 2.2, alpha = 0.95) +
geom_tiplab2(aes(label = name),
size = TIP_LABEL_SIZE,
offset = 0.08 * ht, # ↑ spacing from tips
color = "black",
fontface = "plain") +
scale_color_manual(values = ST_COLORS, na.value = "grey50") +
theme_tree2(base_size = 14) +
theme(
legend.position = "right",
legend.box = "vertical",
plot.margin = margin(20, 20, 20, 20, "pt")
)
# Reserve outer space for heatmap
offset_gap <- 0.35 * ht # Gap between tree tips and heatmap
heatmap_width <- 35.0 * ht # Wider heatmap tiles for clarity
margin_outer <- 0.30 * ht
p_wide <- p + xlim(0, ht + offset_gap + heatmap_width + margin_outer)
# ===== HEATMAP WITH ENHANCED LEGIBILITY =====
final_plot <- gheatmap(
p_wide,
heatmapData,
offset = offset_gap,
width = heatmap_width,
color = "white", # Tile borders for separation
colnames = TRUE,
colnames_position = "top",
colnames_angle = GENE_NAME_ANGLE, # Horizontal = easier to read
colnames_offset_y = 0.06 * ht,
colnames_offset_x = 0,
hjust = 0.0, # Left-align gene names
font.size = HEATMAP_FONT_SIZE,
show.legend = TRUE
) +
scale_fill_manual(
values = HEATMAP_COLORS,
name = "Gene",
labels = c("Present (+)", "Absent (-)"),
drop = FALSE
) +
guides(
fill = guide_legend(
title = "Resistance Gene",
title.position = "top",
title.hjust = 0.5,
override.aes = list(size = 5),
keyheight = unit(0.8, "cm"),
keywidth = unit(1.2, "cm")
),
color = guide_legend(
title = "Sequence Type (ST)",
override.aes = list(size = 4),
ncol = 2 # Wrap legend if many STs
)
) +
theme(
legend.text = element_text(size = LEGEND_TEXT_SIZE, face = "plain"),
legend.title = element_text(size = LEGEND_TITLE_SIZE, face = "bold"),
legend.key.size = unit(1, "cm"),
legend.spacing.y = unit(0.3, "cm"),
plot.title = element_text(hjust = 0.5, size = 18, face = "bold")
) +
ggtitle(paste0("Figure S1: Phylogenomic analysis of ",
toupper(substr(SPECIES,1,1)), substr(SPECIES,2,nchar(SPECIES)),
" isolates (n=", nrow(info), ")"))
# ===== EXPORT HIGH-RESOLUTION FILES =====
# PNG for manuscript submission
png_file <- file.path(OUTPUT_DIR, paste0("FigS1_", SPECIES, "_highres.png"))
png(png_file,
width = FIG_WIDTH,
height = FIG_HEIGHT,
res = DPI,
type = "cairo") # Cairo for better font rendering
print(final_plot)
dev.off()
message("โ Saved: ", png_file)
# SVG for scalable vector graphics (ideal for journals)
svg_file <- file.path(OUTPUT_DIR, paste0("FigS1_", SPECIES, ".svg"))
svg(svg_file, width = 16, height = 14) # inches
print(final_plot)
dev.off()
message("โ Saved: ", svg_file)
# PDF backup
pdf_file <- file.path(OUTPUT_DIR, paste0("FigS1_", SPECIES, ".pdf"))
pdf(pdf_file, width = 16, height = 14)
print(final_plot)
dev.off()
message("โ Saved: ", pdf_file)
# ===== LEGIBILITY CHECK SUMMARY =====
cat("\n=== Figure Enhancement Summary ===\n")
cat("Species: ", SPECIES, "\n")
cat("Isolates: ", nrow(info), "\n")
cat("Output resolution:", FIG_WIDTH, "×", FIG_HEIGHT, "px @", DPI, "DPI\n")
cat("Tip label size: ", TIP_LABEL_SIZE, "pt\n")
cat("Gene label size: ", HEATMAP_FONT_SIZE, "pt (horizontal)\n")
cat("Legend text: ", LEGEND_TEXT_SIZE, "pt\n")
cat("Color scheme: Colorblind-friendly (green/grey)\n")
cat("Formats exported: PNG, SVG, PDF\n")
cat("=================================\n")
๐ Key Improvements for Reviewer Concerns
| Issue | Original | Improved | Benefit |
|---|---|---|---|
| Tip labels | size=2.2, offset=0.06รht | size=2.8-3.4, offset=0.08รht | Clearer isolate names |
| Gene names | angle=90ยฐ, small font | angle=0ยฐ (horizontal), sizeโ30% | No head-tilting required |
| Heatmap tiles | narrow, no borders | wider (35รht), white borders | Better gene pattern visibility |
| Legend fonts | 8-10 pt | 14-16 pt | Readable at 100% zoom |
| Resolution | 200-300 DPI | 400 DPI + Cairo rendering | Crisp text in print/PDF |
| Color scheme | Basic green/grey | Colorblind-friendly palette | Accessible to all readers |
| Output formats | PNG only | PNG + SVG + PDF | Journal-flexible submission |
๐ Quick Execution
# Run for all 4 species
for sp in ecoli kpneumoniae abaumannii paeruginosa; do
Rscript plotTreeHeatmap_improved.R --species ${sp} &
done
wait
echo "โ
All figures generated in high-resolution formats"
๐ Citation & Reproducibility
@article{your_study_2026,
title = {Comparative evaluation of EUCAST RAST and QuickMIC for rapid susceptibility testing...},
author = {Degel-Brossmann, N. and Huang, J. and others},
journal = {Manuscript in revision},
year = {2026}
}
Reproducibility note: All code, parameters, and software versions are documented above. Raw WGS data deposited under BioProject PRJNA1356847.
๐ก Pro Tip: For journals with strict figure size limits, use the SVG output and adjust
FIG_WIDTH/FIG_HEIGHTproportionallyโthe vector format ensures text remains sharp at any scale.