PhyloViz-MRGN: Reproducible Visualization of Core-Genome Phylogenies and ฮฒ-Lactamase Profiles in E. coli, K. pneumoniae, A. baumannii, and P. aeruginosa

๐Ÿงฌ Circular Phylogenetic Tree Pipeline with Resistance Gene Heatmaps

Complete workflow for Figure S1 generation โ€“ suitable for homepage documentation


๐Ÿ“‹ Pipeline Overview

WGS Reads (100 isolates)
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. Species Clusteringโ”‚
โ”‚ • E. coli (n=24)     โ”‚
โ”‚ • K. pneumoniae (n=22)โ”‚
โ”‚ • A. baumannii (n=30)โ”‚
โ”‚ • P. aeruginosa (n=25)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 2. Genome Annotationโ”‚
โ”‚ • Prokka v1.14.5    โ”‚
โ”‚ • TIGRFAMs HMM DB   โ”‚
โ”‚ • Output: *.gff     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 3. Pangenome Analysisโ”‚
โ”‚ • Roary v3.13.0     โ”‚
โ”‚ • Core gene alignmentโ”‚
โ”‚ • Gene P/A matrix   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 4. Resistance Gene  โ”‚
โ”‚    Detection        โ”‚
โ”‚ • Abricate + ResFinderโ”‚
โ”‚ • 13 β-lactamase genesโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 5. Phylogenetic Treeโ”‚
โ”‚ • RAxML-NG          โ”‚
โ”‚ • GTR+G model       โ”‚
โ”‚ • 1000 bootstraps   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 6. Visualization    โ”‚
โ”‚ • ggtree (R)        โ”‚
โ”‚ • Circular layout   โ”‚
โ”‚ • ST-colored tips   โ”‚
โ”‚ • Gene heatmap ring โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Step-by-Step Commands

1๏ธโƒฃ Prokka Annotation (per species)

#!/bin/bash
# prokka_run.sh

SPECIES_CONFIG=(
    "ecoli:Escherichia:coli"
    "kpneumoniae:Klebsiella:pneumoniae"
    "abaumannii:Acinetobacter:baumannii"
    "paeruginosa:Pseudomonas:aeruginosa"
)

for config in "${SPECIES_CONFIG[@]}"; do
    IFS=':' read -r SPECIES_KEY GENUS SPECIES_NAME <<< "$config"

    for SAMPLE in $(cat samples_${SPECIES_KEY}.txt); do
        prokka --force \
            --outdir prokka/${SPECIES_KEY}/${SAMPLE} \
            --cpus 8 \
            --kingdom Bacteria \
            --genus "${GENUS}" \
            --species "${SPECIES_NAME}" \
            --addgenes --addmrna \
            --prefix "${SAMPLE}" \
            --locustag "${SAMPLE}" \
            --hmm /media/jhuang/Titisee/GAMOLA2/TIGRfam_db/TIGRFAMs_15.0_HMM.LIB \
            assemblies/${SAMPLE}.fasta
    done
done

2๏ธโƒฃ Roary Pangenome Analysis

#!/bin/bash
# roary_run.sh

# A. baumannii
roary -f roary/abaumannii -e --mafft -p 40 \
    prokka/abaumannii/*/*.gff

# E. coli
roary -f roary/ecoli -e --mafft -p 40 \
    prokka/ecoli/*/*.gff

# K. pneumoniae
roary -f roary/kpneumoniae -e --mafft -p 40 \
    prokka/kpneumoniae/*/*.gff

# P. aeruginosa
roary -f roary/paeruginosa -e --mafft -p 40 \
    prokka/paeruginosa/*/*.gff

3๏ธโƒฃ Resistance Gene Detection (Abricate + ResFinder)

#!/bin/bash
# abricate_run.sh

# Setup databases (one-time)
abricate --setupdb

# Define target genes
TARGET_GENES="blaCTX-M,blaIMP,blaKPC,blaNDM-1,blaNDM-5,blaOXA-23-like,blaOXA-24-like,blaOXA-48-like,blaOXA-58-like,blaPER-1,blaSHV,blaVEB-1,blaVIM"

for SPECIES in ecoli kpneumoniae abaumannii paeruginosa; do
    for SAMPLE in $(cat samples_${SPECIES}.txt); do
        # Run against ResFinder
        abricate --db resfinder \
            --minid 90 --mincov 80 \
            assemblies/${SAMPLE}.fasta \
            > abricate/${SPECIES}/${SAMPLE}.resfinder.tsv

        # Extract target genes to CSV-ready format
        awk -F'\t' -v sample="${SAMPLE}" '
            NR==1 {next}
            $10 ~ /blaCTX-M|blaIMP|blaKPC|blaNDM-1|blaNDM-5|blaOXA-23-like|blaOXA-24-like|blaOXA-48-like|blaOXA-58-like|blaPER-1|blaSHV|blaVEB-1|blaVIM/ {
                gene=$10; gsub(/[^a-zA-Z0-9.-]/,"_",gene);
                print sample"\t"gene"\t+"
            }' abricate/${SPECIES}/${SAMPLE}.resfinder.tsv
    done | pivot to wide format > isolate_${SPECIES}.csv
done

4๏ธโƒฃ Phylogenetic Tree Construction (RAxML-NG)

#!/bin/bash
# raxml_run.sh

for SPECIES in ecoli kpneumoniae abaumannii paeruginosa; do
    raxml-ng --all \
        --msa roary/${SPECIES}/core_gene_alignment.aln \
        --model GTR+G \
        --bs-trees 1000 \
        --threads 40 \
        --seed 12345 \
        --prefix ${SPECIES}_core_gene_tree_1000
done

5๏ธโƒฃ SNP Distance Matrix (Optional for clonality assessment)

#!/bin/bash
# snp_analysis.sh

conda activate bengal3_ac3

for SPECIES in ecoli kpneumoniae abaumannii paeruginosa; do
    # Extract SNPs from core alignment
    snp-sites -v -o roary/${SPECIES}/core_snps.vcf \
        roary/${SPECIES}/core_gene_alignment.aln

    # Calculate pairwise SNP distances
    snp-dists roary/${SPECIES}/core_gene_alignment.aln \
        > results/${SPECIES}_snp_dist.tsv
done

# Convert to Excel for review
~/Tools/csv2xls-0.4/csv_to_xls.py \
    results/*_snp_dist.tsv \
    -d$'\t' -o results/snp_distances_all_species.xls

๐ŸŽจ Improved R Code for Publication-Quality Figures

Addresses reviewer feedback: “barely legible” โ†’ high-resolution, readable output

#!/usr/bin/env Rscript
# plotTreeHeatmap_improved.R
# Enhanced version for reviewer revision: larger fonts, better spacing, high-DPI export

suppressPackageStartupMessages({
  library(ggtree)
  library(ggplot2)
  library(dplyr)
  library(ape)
  library(treeio)
})

# ===== USER CONFIGURATION =====
SPECIES <- "abaumannii"  # Options: ecoli, kpneumoniae, abaumannii, paeruginosa
INPUT_DIR <- paste0("/home/jhuang/DATA/Data_Ben_Boruta_Analysis/plotTreeHeatmap_", SPECIES)
OUTPUT_DIR <- INPUT_DIR
TREE_FILE <- paste0("../", SPECIES, "_core_gene_tree_1000.raxml.bestTree")
INFO_FILE <- paste0("isolate_", SPECIES, "_.csv")

# High-resolution output settings
DPI <- 400                    # ↑ from 200-300 for print quality
FIG_WIDTH <- 4000             # pixels (≈13.3 inch at 300 DPI)
FIG_HEIGHT <- 3600            # pixels (≈12 inch at 300 DPI)

# Font size multipliers (adjust based on number of tips)
FONT_SCALE <- ifelse(SPECIES %in% c("ecoli", "kpneumoniae"), 1.2, 1.0)
TIP_LABEL_SIZE <- 2.8 * FONT_SCALE      # ↑ from 2.2
HEATMAP_FONT_SIZE <- 3.2 * FONT_SCALE   # ↑ from 2.2
LEGEND_TEXT_SIZE <- 14                  # ↑ from 8-10
LEGEND_TITLE_SIZE <- 16                 # ↑ from 10-12
GENE_NAME_ANGLE <- 0                    # Horizontal labels for readability

# Color schemes
ST_COLORS <- switch(SPECIES,
  "ecoli" = c("10"="cornflowerblue","38"="darkgreen","46"="seagreen3","69"="tan",
              "88"="red","131"="navyblue","156"="gold","167"="green",
              "216"="orange","405"="pink","410"="purple","1882"="magenta",
              "2450"="brown","2851"="darksalmon","3570"="chocolate4","4774"="darkkhaki"),
  "kpneumoniae" = c("11"="cornflowerblue","13"="darkgreen","15"="seagreen3","43"="tan",
                    "54"="red","101"="navyblue","147"="gold","307"="green",
                    "323"="orange","377"="pink","383"="purple","395"="magenta","716"="brown"),
  "abaumannii" = c("1"="cornflowerblue","2"="darkgreen","3"="seagreen3","10"="tan",
                   "25"="red","46"="navyblue","78"="gold","85"="green",
                   "158"="orange","164"="pink","318"="purple","492"="magenta","499"="brown"),
  "paeruginosa" = c("111"="cornflowerblue","233"="darkgreen","235"="seagreen3","273"="tan",
                    "357"="red","644"="navyblue","664"="gold","773"="green",
                    "1242"="orange","1655"="pink","1684"="purple","-"="magenta")
)

HEATMAP_COLORS <- c("+" = "#1b7837", "-" = "#d9d9d9")  # Colorblind-friendly palette

# ===== DATA LOADING =====
setwd(INPUT_DIR)
info <- read.csv(INFO_FILE, sep="\t", check.names = FALSE, stringsAsFactors = FALSE)
info$name <- info$Isolate
info$ST <- factor(info$ST)  # Ensure discrete coloring

# Load tree
tree <- read.tree(TREE_FILE)

# Prepare heatmap data (resistance genes)
gene_cols <- c("Isolate", "blaCTX-M", "blaIMP", "blaKPC", "blaNDM-1", "blaNDM-5",
               "blaOXA-23-like", "blaOXA-24-like", "blaOXA-48-like", "blaOXA-58-like",
               "blaPER-1", "blaSHV", "blaVEB-1", "blaVIM")

heatmapData <- info %>% 
  select(all_of(gene_cols)) %>%
  mutate(across(-Isolate, ~ifelse(. %in% c("+", "1", "TRUE", "Present"), "+", "-")))

rownames(heatmapData) <- heatmapData$Isolate
heatmapData$Isolate <- NULL
heatmapData[] <- lapply(heatmapData, factor, levels = c("+", "-"))  # Preserve both levels

# ===== TREE CALCULATIONS =====
ht <- max(node.depth.edgelength(tree))  # Tree height for proportional spacing

# Base circular tree with enhanced readability
p <- ggtree(tree, layout = "circular", open.angle = 25, branch.length = "none") %<+% info +
  geom_tippoint(aes(color = ST), size = 2.2, alpha = 0.95) +
  geom_tiplab2(aes(label = name), 
               size = TIP_LABEL_SIZE, 
               offset = 0.08 * ht,      # ↑ spacing from tips
               color = "black",
               fontface = "plain") +
  scale_color_manual(values = ST_COLORS, na.value = "grey50") +
  theme_tree2(base_size = 14) +
  theme(
    legend.position = "right",
    legend.box = "vertical",
    plot.margin = margin(20, 20, 20, 20, "pt")
  )

# Reserve outer space for heatmap
offset_gap <- 0.35 * ht    # Gap between tree tips and heatmap
heatmap_width <- 35.0 * ht # Wider heatmap tiles for clarity
margin_outer <- 0.30 * ht

p_wide <- p + xlim(0, ht + offset_gap + heatmap_width + margin_outer)

# ===== HEATMAP WITH ENHANCED LEGIBILITY =====
final_plot <- gheatmap(
  p_wide, 
  heatmapData,
  offset = offset_gap,
  width = heatmap_width,
  color = "white",              # Tile borders for separation
  colnames = TRUE,
  colnames_position = "top",
  colnames_angle = GENE_NAME_ANGLE,  # Horizontal = easier to read
  colnames_offset_y = 0.06 * ht,
  colnames_offset_x = 0,
  hjust = 0.0,                  # Left-align gene names
  font.size = HEATMAP_FONT_SIZE,
  show.legend = TRUE
) +
  scale_fill_manual(
    values = HEATMAP_COLORS, 
    name = "Gene", 
    labels = c("Present (+)", "Absent (-)"),
    drop = FALSE
  ) +
  guides(
    fill = guide_legend(
      title = "Resistance Gene",
      title.position = "top",
      title.hjust = 0.5,
      override.aes = list(size = 5),
      keyheight = unit(0.8, "cm"),
      keywidth = unit(1.2, "cm")
    ),
    color = guide_legend(
      title = "Sequence Type (ST)",
      override.aes = list(size = 4),
      ncol = 2  # Wrap legend if many STs
    )
  ) +
  theme(
    legend.text = element_text(size = LEGEND_TEXT_SIZE, face = "plain"),
    legend.title = element_text(size = LEGEND_TITLE_SIZE, face = "bold"),
    legend.key.size = unit(1, "cm"),
    legend.spacing.y = unit(0.3, "cm"),
    plot.title = element_text(hjust = 0.5, size = 18, face = "bold")
  ) +
  ggtitle(paste0("Figure S1: Phylogenomic analysis of ", 
                 toupper(substr(SPECIES,1,1)), substr(SPECIES,2,nchar(SPECIES)), 
                 " isolates (n=", nrow(info), ")"))

# ===== EXPORT HIGH-RESOLUTION FILES =====
# PNG for manuscript submission
png_file <- file.path(OUTPUT_DIR, paste0("FigS1_", SPECIES, "_highres.png"))
png(png_file, 
    width = FIG_WIDTH, 
    height = FIG_HEIGHT, 
    res = DPI, 
    type = "cairo")  # Cairo for better font rendering
print(final_plot)
dev.off()
message("โœ“ Saved: ", png_file)

# SVG for scalable vector graphics (ideal for journals)
svg_file <- file.path(OUTPUT_DIR, paste0("FigS1_", SPECIES, ".svg"))
svg(svg_file, width = 16, height = 14)  # inches
print(final_plot)
dev.off()
message("โœ“ Saved: ", svg_file)

# PDF backup
pdf_file <- file.path(OUTPUT_DIR, paste0("FigS1_", SPECIES, ".pdf"))
pdf(pdf_file, width = 16, height = 14)
print(final_plot)
dev.off()
message("โœ“ Saved: ", pdf_file)

# ===== LEGIBILITY CHECK SUMMARY =====
cat("\n=== Figure Enhancement Summary ===\n")
cat("Species:          ", SPECIES, "\n")
cat("Isolates:         ", nrow(info), "\n")
cat("Output resolution:", FIG_WIDTH, "×", FIG_HEIGHT, "px @", DPI, "DPI\n")
cat("Tip label size:   ", TIP_LABEL_SIZE, "pt\n")
cat("Gene label size:  ", HEATMAP_FONT_SIZE, "pt (horizontal)\n")
cat("Legend text:      ", LEGEND_TEXT_SIZE, "pt\n")
cat("Color scheme:     Colorblind-friendly (green/grey)\n")
cat("Formats exported: PNG, SVG, PDF\n")
cat("=================================\n")

๐Ÿ“ Key Improvements for Reviewer Concerns

Issue Original Improved Benefit
Tip labels size=2.2, offset=0.06ร—ht size=2.8-3.4, offset=0.08ร—ht Clearer isolate names
Gene names angle=90ยฐ, small font angle=0ยฐ (horizontal), sizeโ†‘30% No head-tilting required
Heatmap tiles narrow, no borders wider (35ร—ht), white borders Better gene pattern visibility
Legend fonts 8-10 pt 14-16 pt Readable at 100% zoom
Resolution 200-300 DPI 400 DPI + Cairo rendering Crisp text in print/PDF
Color scheme Basic green/grey Colorblind-friendly palette Accessible to all readers
Output formats PNG only PNG + SVG + PDF Journal-flexible submission

๐Ÿ”„ Quick Execution

# Run for all 4 species
for sp in ecoli kpneumoniae abaumannii paeruginosa; do
  Rscript plotTreeHeatmap_improved.R --species ${sp} &
done
wait
echo "โœ… All figures generated in high-resolution formats"

๐Ÿ“š Citation & Reproducibility

@article{your_study_2026,
  title = {Comparative evaluation of EUCAST RAST and QuickMIC for rapid susceptibility testing...},
  author = {Degel-Brossmann, N. and Huang, J. and others},
  journal = {Manuscript in revision},
  year = {2026}
}

Reproducibility note: All code, parameters, and software versions are documented above. Raw WGS data deposited under BioProject PRJNA1356847.


๐Ÿ’ก Pro Tip: For journals with strict figure size limits, use the SVG output and adjust FIG_WIDTH/FIG_HEIGHT proportionallyโ€”the vector format ensures text remains sharp at any scale.

Leave a Reply

Your email address will not be published. Required fields are marked *