Daily Archives: 2026年6月22日

Generating report using R (Data_Tam_Metagenomics_2026_Wastewater)

To analyze your metagenomics data with greater control, you can use the R packages phyloseq, ggplot2, vegan, and MaAsLin2. The biobakery workflow has already generated the merged tables you need in the results/ directory.

Below is the complete R code to:

  1. Load the data and reproduce the figures from wmgx_report.pdf (PCoA, Heatmap, Stacked Barplot).
  2. Perform differential abundance analysis between Pre-treatment and Post-treatment.
  3. Statistically test if the differences between time points are significant.

Prerequisites

Make sure you have the following R packages installed:

install.packages(c("phyloseq", "ggplot2", "vegan", "dplyr", "tidyr", "pheatmap"))
# MaAsLin2 is installed via Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("MaAsLin2")

Step 1: Load Data and Create Metadata

The biobakery workflow merged your samples into single tables. We will load the MetaPhlAn species table and HUMAnN pathway table, and map your experimental design.

library(phyloseq)
library(ggplot2)
library(dplyr)
library(tidyr)

# 1. Load MetaPhlAn taxonomic profiles
mpa_file <- "results/metaphlan/merged/metaphlan_taxonomic_profiles.tsv"
mpa_raw <- read.table(mpa_file, header=TRUE, sep="\t", comment.char="", check.names=FALSE)
colnames(mpa_raw)[1] <- "clade_name"
rownames(mpa_raw) <- mpa_raw$clade_name
mpa_otu <- mpa_raw[, -1]

# Filter for species level (s__) and clean names
species_idx <- grep("s__", rownames(mpa_otu))
species_otu <- mpa_otu[species_idx, ]
species_names <- gsub(".*s__", "", rownames(species_otu))
species_names <- gsub("\\|.*", "", species_names)
species_names <- gsub("_", " ", species_names) # Replace underscores with spaces
rownames(species_otu) <- species_names

# 2. Create Metadata based on your file names
sample_names <- colnames(species_otu)
metadata <- data.frame(
  row.names = sample_names,
  TimePoint = case_when(
    grepl("Nov", sample_names) ~ "Nov",
    grepl("Jan", sample_names) ~ "Jan",
    grepl("Mar", sample_names) ~ "Mar",
    grepl("May", sample_names) ~ "May"
  ),
  Treatment = ifelse(grepl("Pre", sample_names), "Pre", "Post")
)

# 3. Create phyloseq object for Species
tax_mat <- matrix(rownames(species_otu), ncol=1, dimnames=list(rownames(species_otu), "Species"))
species_ps <- phyloseq(
  otu_table(species_otu, taxa_are_rows=TRUE),
  sample_data(metadata),
  tax_table(tax_mat)
)

# (Optional) Load HUMAnN pathways using the same logic
# path_file <- "results/humann/merged/pathabundance_relab.tsv"
# ... (similar loading steps as above) ...

Step 2: Reproduce Figures from wmgx_report.pdf

1. Ordination (PCoA)

Reproduces the Bray-Curtis Principal Coordinate Analysis.

library(vegan)

# Calculate Bray-Curtis distance and perform PCoA
ord <- ordinate(species_ps, method="PCoA", distance="bray")

# Plot
plot_ordination(species_ps, ord, color="Treatment", shape="TimePoint") +
  geom_point(size=4) +
  theme_minimal() +
  labs(title="PCoA of Species Profiles (Bray-Curtis)", 
       subtitle="Colored by Treatment, Shaped by Time Point") +
  theme(legend.position = "right")

2. Heatmap (Top 25 Species)

Reproduces the hierarchical clustering heatmap using Euclidean distance.

# Get top 25 most abundant species
top25 <- names(sort(taxa_sums(species_ps), decreasing=TRUE))[1:25]
ps_top25 <- prune_taxa(top25, species_ps)

# Plot heatmap (method=NULL defaults to hierarchical clustering)
plot_heatmap(ps_top25, method=NULL, distance="euclidean", 
             taxa.labels=TRUE, label.size=3, na.value=NA) +
  scale_fill_viridis_c() +
  labs(title="Heatmap of Top 25 Most Abundant Species")

3. Stacked Barplot (Top 15 Species)

Reproduces the stacked barplot, sorted by total abundance.

# Get top 15 species
top15 <- names(sort(taxa_sums(species_ps), decreasing=TRUE))[1:15]
ps_top15 <- prune_taxa(top15, species_ps)

# Melt data for ggplot2
df <- psmelt(ps_top15)

# Sort samples by the total abundance of these top 15 species (decreasing order)
sample_order <- df %>%
  group_by(Sample) %>%
  summarise(Total_Ab = sum(Abundance)) %>%
  arrange(desc(Total_Ab)) %>%
  pull(Sample)

df$Sample <- factor(df$Sample, levels=sample_order)

# Plot
ggplot(df, aes(x=Sample, y=Abundance, fill=Species)) +
  geom_bar(stat="identity", position="stack") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5)) +
  labs(title="Relative Abundance of Top 15 Most Abundant Species", 
       y="Relative Abundance") +
  scale_fill_discrete(name = "Species")

Step 3: Differential Abundance Analysis (Pre vs. Post)

To find which specific species or pathways are significantly different between Pre- and Post-treatment, we use MaAsLin2 (the official biobakery recommendation). It allows us to test the effect of Treatment while adjusting for TimePoint as a covariate.

library(MaAsLin2)

# MaAsLin2 requires samples as rows and features as columns
species_t <- t(otu_table(species_ps))
species_df <- as.data.frame(species_t)
species_df$sample_id <- rownames(species_df)

metadata_df <- data.frame(
  sample_id = rownames(metadata),
  metadata
)

# Run MaAsLin2
# We set "Pre" and "Nov" as reference groups. 
# The output coefficients will represent the difference of Post vs Pre.
fit_results <- MaAsLin2(
  input_data = species_df,
  input_metadata = metadata_df,
  output = "maaslin2_species_output",
  fixed_effects = c("Treatment", "TimePoint"),
  reference = c("Treatment,Pre", "TimePoint,Nov"), 
  random_effects = c(),
  correction = "FDR",
  standardize = FALSE,
  min_abundance = 0.001, 
  min_prevalence = 0.1   # Feature must be present in at least 10% of samples
)

# View significant results (q-value < 0.05)
sig_results <- subset(fit_results$results, qval < 0.05 & metadata == "Treatment")
if(nrow(sig_results) > 0) {
  print(sig_results[, c("feature", "coef", "qval")] %>% arrange(qval))
} else {
  print("No significantly different species found at FDR < 0.05.")
}

Note: You can run the exact same code for pathways by replacing species_df with your HUMAnN pathway data frame.


Step 4: Is the difference between time points significant?

To determine if the microbial community composition changes significantly across the different time points (Nov, Jan, Mar, May), we use PERMANOVA (via the adonis2 function in the vegan package). This tests the overall community structure rather than individual features.

library(vegan)

# Calculate Bray-Curtis distance matrix
dist_bc <- vegdist(t(otu_table(species_ps)), method="bray")

# Run PERMANOVA
# This tests the main effects of Treatment and TimePoint, and their interaction
adonis_res <- adonis2(dist_bc ~ Treatment * TimePoint, data=metadata, permutations=999)
print(adonis_res)

How to interpret the adonis2 output:

  1. Treatment row: Look at the Pr(>F) (p-value). If it is < 0.05, it means there is a statistically significant difference in the overall microbiome between Pre-treatment and Post-treatment.
  2. TimePoint row: Look at the Pr(>F). If it is < 0.05, it means the microbial community composition differs significantly across the months (Nov, Jan, Mar, May).
  3. Treatment:TimePoint row: If this interaction is significant (< 0.05), it means the effect of the treatment depends on the time of year (e.g., the treatment works differently in November compared to May).

Important Note on Statistical Power: Your dataset has 11 samples total. In the PERMANOVA model, this leaves only 3 residual degrees of freedom. While this mathematically allows for testing, the statistical power is low. Significant results are highly trustworthy, but non-significant results might simply be due to the small sample size.

Processing RNAseq (Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE)

  1. Preparing raw data

     mkdir raw_data; cd raw_data
    
     # control samples (8)
     ln -s ../X101SC26025981-Z01-J001/01.RawData/1/1_1.fq.gz AYE-WT_ctr_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/1/1_2.fq.gz AYE-WT_ctr_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/2/2_1.fq.gz AYE-WT_ctr_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/2/2_2.fq.gz AYE-WT_ctr_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/3/3_1.fq.gz AYE-WT_ctr_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/3/3_2.fq.gz AYE-WT_ctr_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/4/4_1.fq.gz AYE-T_ctr_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/4/4_2.fq.gz AYE-T_ctr_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/5/5_1.fq.gz AYE-T_ctr_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/5/5_2.fq.gz AYE-T_ctr_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/6/6_1.fq.gz AYE-T_ctr_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/6/6_2.fq.gz AYE-T_ctr_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/7/7_1.fq.gz AYE-O_ctr_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/7/7_2.fq.gz AYE-O_ctr_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/8/8_1.fq.gz AYE-O_ctr_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/8/8_2.fq.gz AYE-O_ctr_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/9/9_1.fq.gz AYE-O_ctr_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/9/9_2.fq.gz AYE-O_ctr_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/10/10_1.fq.gz O-Trans_ctr_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/10/10_2.fq.gz O-Trans_ctr_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/11/11_1.fq.gz O-Trans_ctr_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/11/11_2.fq.gz O-Trans_ctr_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/12/12_1.fq.gz O-Trans_ctr_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/12/12_2.fq.gz O-Trans_ctr_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/1new/1new_1.fq.gz WT-Trans_ctr_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/1new/1new_2.fq.gz WT-Trans_ctr_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/2new/2new_1.fq.gz WT-Trans_ctr_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/2new/2new_2.fq.gz WT-Trans_ctr_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/3new/3new_1.fq.gz WT-Trans_ctr_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/3new/3new_2.fq.gz WT-Trans_ctr_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/49/49_1.fq.gz AYE-WT_ctr_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/49/49_2.fq.gz AYE-WT_ctr_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/50/50_1.fq.gz AYE-WT_ctr_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/50/50_2.fq.gz AYE-WT_ctr_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/51/51_1.fq.gz AYE-WT_ctr_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/51/51_2.fq.gz AYE-WT_ctr_solid_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/52/52_1.fq.gz AYE-O_ctr_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/52/52_2.fq.gz AYE-O_ctr_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/53/53_1.fq.gz AYE-O_ctr_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/53/53_2.fq.gz AYE-O_ctr_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/54/54_1.fq.gz AYE-O_ctr_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/54/54_2.fq.gz AYE-O_ctr_solid_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/55/55_1.fq.gz AYE-T_ctr_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/55/55_2.fq.gz AYE-T_ctr_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/56/56_1.fq.gz AYE-T_ctr_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/56/56_2.fq.gz AYE-T_ctr_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/57/57_1.fq.gz AYE-T_ctr_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/57/57_2.fq.gz AYE-T_ctr_solid_r3_R2.fastq.gz
    
     # Diclofenac(双氯芬酸)treatment (6)
     ln -s ../X101SC26025981-Z01-J001/01.RawData/25/25_1.fq.gz AYE-WT_Diclo750_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/25/25_2.fq.gz AYE-WT_Diclo750_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/26/26_1.fq.gz AYE-WT_Diclo750_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/26/26_2.fq.gz AYE-WT_Diclo750_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/27/27_1.fq.gz AYE-WT_Diclo750_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/27/27_2.fq.gz AYE-WT_Diclo750_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/28/28_1.fq.gz AYE-T_Diclo375_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/28/28_2.fq.gz AYE-T_Diclo375_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/29/29_1.fq.gz AYE-T_Diclo375_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/29/29_2.fq.gz AYE-T_Diclo375_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/30/30_1.fq.gz AYE-T_Diclo375_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/30/30_2.fq.gz AYE-T_Diclo375_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/31/31_1.fq.gz AYE-O_Diclo375_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/31/31_2.fq.gz AYE-O_Diclo375_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/32/32_1.fq.gz AYE-O_Diclo375_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/32/32_2.fq.gz AYE-O_Diclo375_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/33/33_1.fq.gz AYE-O_Diclo375_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/33/33_2.fq.gz AYE-O_Diclo375_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/34/34_1.fq.gz O-Trans_Diclo375_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/34/34_2.fq.gz O-Trans_Diclo375_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/35/35_1.fq.gz O-Trans_Diclo375_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/35/35_2.fq.gz O-Trans_Diclo375_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/36/36_1.fq.gz O-Trans_Diclo375_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/36/36_2.fq.gz O-Trans_Diclo375_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/4new/4new_1.fq.gz WT-Trans_Diclo750_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/4new/4new_2.fq.gz WT-Trans_Diclo750_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/5new/5new_1.fq.gz WT-Trans_Diclo750_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/5new/5new_2.fq.gz WT-Trans_Diclo750_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/6new/6new_1.fq.gz WT-Trans_Diclo750_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/6new/6new_2.fq.gz WT-Trans_Diclo750_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/73/73_1.fq.gz AYE-WT_Diclo1250_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/73/73_2.fq.gz AYE-WT_Diclo1250_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/74/74_1.fq.gz AYE-WT_Diclo1250_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/74/74_2.fq.gz AYE-WT_Diclo1250_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/75/75_1.fq.gz AYE-WT_Diclo1250_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/75/75_2.fq.gz AYE-WT_Diclo1250_solid_r3_R2.fastq.gz
    
     # Rifampicin(利福平)treatment (4)
     ln -s ../X101SC26025981-Z01-J001/01.RawData/13/13_1.fq.gz AYE-WT_Rifampicin1.5_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/13/13_2.fq.gz AYE-WT_Rifampicin1.5_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/14/14_1.fq.gz AYE-WT_Rifampicin1.5_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/14/14_2.fq.gz AYE-WT_Rifampicin1.5_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/15/15_1.fq.gz AYE-WT_Rifampicin1.5_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/15/15_2.fq.gz AYE-WT_Rifampicin1.5_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/16/16_1.fq.gz AYE-T_Rifampicin2_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/16/16_2.fq.gz AYE-T_Rifampicin2_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/17/17_1.fq.gz AYE-T_Rifampicin2_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/17/17_2.fq.gz AYE-T_Rifampicin2_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/18/18_1.fq.gz AYE-T_Rifampicin2_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/18/18_2.fq.gz AYE-T_Rifampicin2_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/19/19_1.fq.gz AYE-O_Rifampicin2_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/19/19_2.fq.gz AYE-O_Rifampicin2_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/20/20_1.fq.gz AYE-O_Rifampicin2_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/20/20_2.fq.gz AYE-O_Rifampicin2_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/21/21_1.fq.gz AYE-O_Rifampicin2_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/21/21_2.fq.gz AYE-O_Rifampicin2_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/22/22_1.fq.gz O-Trans_Rifampicin2_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/22/22_2.fq.gz O-Trans_Rifampicin2_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/23/23_1.fq.gz O-Trans_Rifampicin2_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/23/23_2.fq.gz O-Trans_Rifampicin2_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/24/24_1.fq.gz O-Trans_Rifampicin2_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/24/24_2.fq.gz O-Trans_Rifampicin2_r3_R2.fastq.gz
    
     # Meropenem(美罗培南)treatment (4)
     ln -s ../X101SC26025981-Z01-J001/01.RawData/37/37_1.fq.gz AYE-WT_Mero0.35-0.5_r1_R1.fastq.gz  #AYE-WT_Mero0.5_r1
     ln -s ../X101SC26025981-Z01-J001/01.RawData/37/37_2.fq.gz AYE-WT_Mero0.35-0.5_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/38/38_1.fq.gz AYE-WT_Mero0.35-0.5_r2_R1.fastq.gz  #AYE-WT_YX_Mero0.35_r2
     ln -s ../X101SC26025981-Z01-J001/01.RawData/38/38_2.fq.gz AYE-WT_Mero0.35-0.5_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/39/39_1.fq.gz AYE-WT_Mero0.35-0.5_r3_R1.fastq.gz  #AYE-WT_public_Mero0.35_r3
     ln -s ../X101SC26025981-Z01-J001/01.RawData/39/39_2.fq.gz AYE-WT_Mero0.35-0.5_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/40/40_1.fq.gz AYE-T_Mero0.15_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/40/40_2.fq.gz AYE-T_Mero0.15_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/41/41_1.fq.gz AYE-T_Mero0.15_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/41/41_2.fq.gz AYE-T_Mero0.15_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/42/42_1.fq.gz AYE-T_Mero0.15_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/42/42_2.fq.gz AYE-T_Mero0.15_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/43/43_1.fq.gz AYE-O_Mero0.5_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/43/43_2.fq.gz AYE-O_Mero0.5_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/44/44_1.fq.gz AYE-O_Mero0.5_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/44/44_2.fq.gz AYE-O_Mero0.5_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/45/45_1.fq.gz AYE-O_Mero0.5_r3_R1.fastq.gz  #Mero0.45
     ln -s ../X101SC26025981-Z01-J001/01.RawData/45/45_2.fq.gz AYE-O_Mero0.5_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/46/46_1.fq.gz O-Trans_Mero0.25_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/46/46_2.fq.gz O-Trans_Mero0.25_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/47/47_1.fq.gz O-Trans_Mero0.25_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/47/47_2.fq.gz O-Trans_Mero0.25_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/48/48_1.fq.gz O-Trans_Mero0.25_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/48/48_2.fq.gz O-Trans_Mero0.25_r3_R2.fastq.gz
    
     # Azithromycin(阿奇霉素)treatment (5), among them, F_ctr_solid is clinical isolate.
     ln -s ../X101SC26025981-Z01-J001/01.RawData/58/58_1.fq.gz F_ctr_solid_r1_R1.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/58/58_2.fq.gz F_ctr_solid_r1_R2.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/59/59_1.fq.gz F_ctr_solid_r2_R1.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/59/59_2.fq.gz F_ctr_solid_r2_R2.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/60/60_1.fq.gz F_ctr_solid_r3_R1.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/60/60_2.fq.gz F_ctr_solid_r3_R2.fastq.gz  #clinical
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/61/61_1.fq.gz AYE-WT_Azi20_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/61/61_2.fq.gz AYE-WT_Azi20_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/62/62_1.fq.gz AYE-WT_Azi20_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/62/62_2.fq.gz AYE-WT_Azi20_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/63/63_1.fq.gz AYE-WT_Azi20_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/63/63_2.fq.gz AYE-WT_Azi20_solid_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/67/67_1.fq.gz AYE-T_Azi20_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/67/67_2.fq.gz AYE-T_Azi20_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/68/68_1.fq.gz AYE-T_Azi20_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/68/68_2.fq.gz AYE-T_Azi20_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/69/69_1.fq.gz AYE-T_Azi20_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/69/69_2.fq.gz AYE-T_Azi20_solid_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/64/64_1.fq.gz AYE-O_Azi20_solid_r1_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/64/64_2.fq.gz AYE-O_Azi20_solid_r1_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/65/65_1.fq.gz AYE-O_Azi20_solid_r2_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/65/65_2.fq.gz AYE-O_Azi20_solid_r2_R2.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/66/66_1.fq.gz AYE-O_Azi20_solid_r3_R1.fastq.gz
     ln -s ../X101SC26025981-Z01-J001/01.RawData/66/66_2.fq.gz AYE-O_Azi20_solid_r3_R2.fastq.gz
    
     ln -s ../X101SC26025981-Z01-J001/01.RawData/70/70_1.fq.gz F_Azi20_solid_r1_R1.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/70/70_2.fq.gz F_Azi20_solid_r1_R2.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/71/71_1.fq.gz F_Azi20_solid_r2_R1.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/71/71_2.fq.gz F_Azi20_solid_r2_R2.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/72/72_1.fq.gz F_Azi20_solid_r3_R1.fastq.gz  #clinical
     ln -s ../X101SC26025981-Z01-J001/01.RawData/72/72_2.fq.gz F_Azi20_solid_r3_R2.fastq.gz  #clinical
  2. Preparing the directory trimmed

     mkdir trimmed trimmed_unpaired;
     for sample_id in AYE-O_Azi20_solid_r1 AYE-O_Azi20_solid_r2 AYE-O_Azi20_solid_r3 AYE-O_ctr_r1 AYE-O_ctr_r2 AYE-O_ctr_r3 AYE-O_ctr_solid_r1 AYE-O_ctr_solid_r2 AYE-O_ctr_solid_r3 AYE-O_Diclo375_r1 AYE-O_Diclo375_r2 AYE-O_Diclo375_r3 AYE-O_Mero0.5_r1 AYE-O_Mero0.5_r2 AYE-O_Mero0.5_r3 AYE-O_Rifampicin2_r1 AYE-O_Rifampicin2_r2 AYE-O_Rifampicin2_r3 AYE-T_Azi20_solid_r1 AYE-T_Azi20_solid_r2 AYE-T_Azi20_solid_r3 AYE-T_ctr_r1 AYE-T_ctr_r2 AYE-T_ctr_r3 AYE-T_ctr_solid_r1 AYE-T_ctr_solid_r2 AYE-T_ctr_solid_r3 AYE-T_Diclo375_r1 AYE-T_Diclo375_r2 AYE-T_Diclo375_r3 AYE-T_Mero0.15_r1 AYE-T_Mero0.15_r2 AYE-T_Mero0.15_r3 AYE-T_Rifampicin2_r1 AYE-T_Rifampicin2_r2 AYE-T_Rifampicin2_r3 AYE-WT_Azi20_solid_r1 AYE-WT_Azi20_solid_r2 AYE-WT_Azi20_solid_r3 AYE-WT_ctr_r1 AYE-WT_ctr_r2 AYE-WT_ctr_r3 AYE-WT_ctr_solid_r1 AYE-WT_ctr_solid_r2 AYE-WT_ctr_solid_r3 AYE-WT_Diclo1250_solid_r1 AYE-WT_Diclo1250_solid_r2 AYE-WT_Diclo1250_solid_r3 AYE-WT_Diclo750_r1 AYE-WT_Diclo750_r2 AYE-WT_Diclo750_r3 AYE-WT_Mero0.35-0.5_r1 AYE-WT_Mero0.35-0.5_r2 AYE-WT_Mero0.35-0.5_r3 AYE-WT_Rifampicin1.5_r1 AYE-WT_Rifampicin1.5_r2 AYE-WT_Rifampicin1.5_r3 F_Azi20_solid_r1 F_Azi20_solid_r2 F_Azi20_solid_r3 F_ctr_solid_r1 F_ctr_solid_r2 F_ctr_solid_r3 O-Trans_ctr_r1 O-Trans_ctr_r2 O-Trans_ctr_r3 O-Trans_Diclo375_r1 O-Trans_Diclo375_r2 O-Trans_Diclo375_r3 O-Trans_Mero0.25_r1 O-Trans_Mero0.25_r2 O-Trans_Mero0.25_r3 O-Trans_Rifampicin2_r1 O-Trans_Rifampicin2_r2 O-Trans_Rifampicin2_r3 WT-Trans_ctr_r1 WT-Trans_ctr_r2 WT-Trans_ctr_r3 WT-Trans_Diclo750_r1 WT-Trans_Diclo750_r2 WT-Trans_Diclo750_r3; do \
     for sample_id in AYE-T_Diclo375_r2; do \
             java -jar /home/jhuang/Tools/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 100 raw_data/${sample_id}_R1.fastq.gz raw_data/${sample_id}_R2.fastq.gz trimmed/${sample_id}_R1.fastq.gz trimmed_unpaired/${sample_id}_R1.fastq.gz trimmed/${sample_id}_R2.fastq.gz trimmed_unpaired/${sample_id}_R2.fastq.gz ILLUMINACLIP:/home/jhuang/Tools/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 AVGQUAL:20; done 2> trimmomatic_pe.log;
     done
  3. (Optional) using trinity to find the most closely reference

     #Trinity --seqType fq --max_memory 50G --left trimmed/wt_r1_R1.fastq.gz  --right trimmed/wt_r1_R2.fastq.gz --CPU 12
    
     #https://www.genome.jp/kegg/tables/br08606.html#prok
     acb     KGB     Acinetobacter baumannii ATCC 17978  2007    GenBank
     abm     KGB     Acinetobacter baumannii SDF     2008    GenBank
     aby     KGB     Acinetobacter baumannii AYE     2008    GenBank --> *
     abc     KGB     Acinetobacter baumannii ACICU   2008    GenBank
     abn     KGB     Acinetobacter baumannii AB0057  2008    GenBank
     abb     KGB     Acinetobacter baumannii AB307-0294  2008    GenBank
     abx     KGB     Acinetobacter baumannii 1656-2  2012    GenBank
     abz     KGB     Acinetobacter baumannii MDR-ZJ06    2012    GenBank
     abr     KGB     Acinetobacter baumannii MDR-TJ  2012    GenBank
     abd     KGB     Acinetobacter baumannii TCDC-AB0715     2012    GenBank
     abh     KGB     Acinetobacter baumannii TYTH-1  2012    GenBank
     abad    KGB     Acinetobacter baumannii D1279779    2013    GenBank
     abj     KGB     Acinetobacter baumannii BJAB07104   2013    GenBank
     abab    KGB     Acinetobacter baumannii BJAB0715    2013    GenBank
     abaj    KGB     Acinetobacter baumannii BJAB0868    2013    GenBank
     abaz    KGB     Acinetobacter baumannii ZW85-1  2013    GenBank
     abk     KGB     Acinetobacter baumannii AbH12O-A2   2014    GenBank
     abau    KGB     Acinetobacter baumannii AB030   2014    GenBank
     abaa    KGB     Acinetobacter baumannii AB031   2014    GenBank
     abw     KGB     Acinetobacter baumannii AC29    2014    GenBank
     abal    KGB     Acinetobacter baumannii LAC-4   2015    GenBank
     #Note that the Acinetobacter baumannii strain ATCC 19606 chromosome, complete genome (GenBank: CU459141.1) was choosen as reference!
  4. Preparing samplesheet.csv

     sample,fastq_1,fastq_2,strandedness
     Urine_r1,Urine_r1_R1.fq.gz,Urine_r1_R2.fq.gz,auto
     ...
  5. Downloading CU459141.fasta and CU459141.gff from GenBank and preparing CU459141_m.gff

     #Example1: http://xgenes.com/article/article-content/157/prepare-virus-gtf-for-nextflow-run/
     #Default NOT_WORKING: --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'exon'
     #(host_env) !NOT_WORKING! jhuang@WS-2290C:~/DATA/Data_Tam_RNAseq_2024$ /usr/local/bin/nextflow run rnaseq/main.nf --input samplesheet.csv --outdir results    --fasta "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CU459141.fasta" --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CU459141.gff"        -profile docker -resume  --max_cpus 55 --max_memory 512.GB --max_time 2400.h    --save_align_intermeds --save_unaligned --save_reference    --aligner 'star_salmon'    --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'transcript'
    
     # -- DEBUG_1 (CDS --> exon in CP059040.gff) --
     #Checking the record (see below) in results/genome/CP059040.gtf
     #In ./results/genome/CP059040.gtf e.g. "CP059040.1      Genbank transcript      1       1398    .       +       .       transcript_id "gene-H0N29_00005"; gene_id "gene-H0N29_00005"; gene_name "dnaA"; Name "dnaA"; gbkey "Gene"; gene "dnaA"; gene_biotype "protein_coding"; locus_tag "H0N29_00005";"
     #--featurecounts_feature_type 'transcript' returns only the tRNA results
     #Since the tRNA records have "transcript and exon". In gene records, we have "transcript and CDS". replace the CDS with exon
    
     grep -P "\texon\t" CP059040.gff | sort | wc -l    #96
     grep -P "cmsearch\texon\t" CP059040.gff | wc -l    #=10  ignal recognition particle sRNA small typ, transfer-messenger RNA, 5S ribosomal RNA
     grep -P "Genbank\texon\t" CP059040.gff | wc -l    #=12  16S and 23S ribosomal RNA
     grep -P "tRNAscan-SE\texon\t" CP059040.gff | wc -l    #tRNA 74
     wc -l star_salmon/AUM_r3/quant.genes.sf  #--featurecounts_feature_type 'transcript' results in 96 records!
    
     grep -P "\tCDS\t" CU459141.gff3 | wc -l  #3659
     sed 's/\tCDS\t/\texon\t/g' CU459141.gff3 > CU459141_m.gff
     grep -P "\texon\t" CU459141_m.gff | sort | wc -l  #3760
    
     # -- DEBUG_2: combination of 'CU459141_m.gff' and 'exon' results in ERROR, using 'transcript' instead!
     --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2026_on_AYE/CU459141_m.gff" --featurecounts_feature_type 'transcript'
    
     # -- DEBUG_3: make sure the header of fasta is the same to the *_m.gff file
  6. nextflow run

     # ---- SUCCESSFUL with directly downloaded gff3 and fasta from NCBI using docker after replacing 'CDS' with 'exon' ----
     (host_env) mv trimmed/*.fastq.gz .
     (host_env) nextflow run nf-core/rnaseq -r 3.14.0 -profile docker \
     --input samplesheet.csv --outdir results    --fasta "/home/jhuang/DATA/Data_Tam_RNAseq_2026_on_AYE/CU459141.fasta" --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2026_on_AYE/CU459141_m.gff"        -resume  --max_cpus 90 --max_memory 900.GB --max_time 2400.h    --save_align_intermeds --save_unaligned --save_reference    --aligner 'star_salmon'    --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'transcript'
  7. (OPTIONAL, since the R-script complete_deg_pipeline_custom_cutoff.R runs completely) Generate PCA-plot

     #mamba activate r_env
    
     #install.packages("ggfun")
     # Import the required libraries
     library("AnnotationDbi")
     library("clusterProfiler")
     library("ReactomePA")
     library(gplots)
     library(tximport)
     library(DESeq2)
     #library("org.Hs.eg.db")
     library(dplyr)
     library(tidyverse)
     #install.packages("devtools")
     #devtools::install_version("gtable", version = "0.3.0")
     library(gplots)
     library("RColorBrewer")
     #install.packages("ggrepel")
     library("ggrepel")
     # install.packages("openxlsx")
     library(openxlsx)
     library(EnhancedVolcano)
     library(DESeq2)
     library(edgeR)
    
     setwd("~/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon")
     # Define paths to your Salmon output quantification files
    
     # Store sample names in a character vector
     samples <- c(
         "AYE-O_Azi20_solid_r1", "AYE-O_Azi20_solid_r2", "AYE-O_Azi20_solid_r3", "AYE-O_ctr_r1", "AYE-O_ctr_r2",
         "AYE-O_ctr_r3", "AYE-O_ctr_solid_r1", "AYE-O_ctr_solid_r2", "AYE-O_ctr_solid_r3",
         "AYE-O_Diclo375_r1", "AYE-O_Diclo375_r2", "AYE-O_Diclo375_r3", "AYE-O_Mero0.5_r1",
         "AYE-O_Mero0.5_r2", "AYE-O_Mero0.5_r3", "AYE-O_Rifampicin2_r1", "AYE-O_Rifampicin2_r2",
         "AYE-O_Rifampicin2_r3", "AYE-T_Azi20_solid_r1", "AYE-T_Azi20_solid_r2", "AYE-T_Azi20_solid_r3",
         "AYE-T_ctr_r1", "AYE-T_ctr_r2", "AYE-T_ctr_r3", "AYE-T_ctr_solid_r1", "AYE-T_ctr_solid_r2",
         "AYE-T_ctr_solid_r3", "AYE-T_Diclo375_r1", "AYE-T_Diclo375_r2", "AYE-T_Diclo375_r3",
         "AYE-T_Mero0.15_r1", "AYE-T_Mero0.15_r2", "AYE-T_Mero0.15_r3", "AYE-T_Rifampicin2_r1",
         "AYE-T_Rifampicin2_r2", "AYE-T_Rifampicin2_r3", "AYE-WT_Azi20_solid_r1", "AYE-WT_Azi20_solid_r2",
         "AYE-WT_Azi20_solid_r3", "AYE-WT_ctr_r1", "AYE-WT_ctr_r2", "AYE-WT_ctr_r3", "AYE-WT_ctr_solid_r1",
         "AYE-WT_ctr_solid_r2", "AYE-WT_ctr_solid_r3", "AYE-WT_Diclo1250_solid_r1", "AYE-WT_Diclo1250_solid_r2",
         "AYE-WT_Diclo1250_solid_r3", "AYE-WT_Diclo750_r1", "AYE-WT_Diclo750_r2", "AYE-WT_Diclo750_r3",
         "AYE-WT_Mero0.35-0.5_r1", "AYE-WT_Mero0.35-0.5_r2", "AYE-WT_Mero0.35-0.5_r3", "AYE-WT_Rifampicin1.5_r1",
         "AYE-WT_Rifampicin1.5_r2", "AYE-WT_Rifampicin1.5_r3", "F_Azi20_solid_r1", "F_Azi20_solid_r2",
         "F_Azi20_solid_r3", "F_ctr_solid_r1", "F_ctr_solid_r2", "F_ctr_solid_r3",
         "O-Trans_ctr_r1","O-Trans_ctr_r2","O-Trans_ctr_r3",  "O-Trans_Diclo375_r1","O-Trans_Diclo375_r2","O-Trans_Diclo375_r3",
         "O-Trans_Mero0.25_r1", "O-Trans_Mero0.25_r2", "O-Trans_Mero0.25_r3", "O-Trans_Rifampicin2_r1",
         "O-Trans_Rifampicin2_r2", "O-Trans_Rifampicin2_r3", "WT-Trans_ctr_r1", "WT-Trans_ctr_r2",
         "WT-Trans_ctr_r3", "WT-Trans_Diclo750_r1", "WT-Trans_Diclo750_r2", "WT-Trans_Diclo750_r3"
     )
    
     # Automatically generate the named vector
     files <- setNames(paste0("./", samples, "/quant.sf"), samples)
     # Import the transcript abundance data with tximport
     txi <- tximport(files, type = "salmon", txIn = TRUE, txOut = TRUE)
    
     # -----------------------------------------------------------------
     # ---- Step 1: Create Detailed Metadata from Your Sample Names ----
    
     # Extract metadata from sample names
     samples <- names(files)
    
     # Parse the complex sample names
     metadata <- data.frame(
     sample = samples,
     stringsAsFactors = FALSE
     )
    
     # Extract strain (everything before first underscore or hyphen treatment)
     metadata$strain <- sapply(strsplit(samples, "[-_]"), function(x) {
     if(x[1] %in% c("AYE", "O", "WT", "F")) {
         if(x[1] == "AYE" && length(x) > 1 && x[2] %in% c("WT", "T", "O")) {
         paste(x[1:2], collapse = "-")
         } else if(x[1] %in% c("O", "WT") && x[2] == "Trans") {
         paste(x[1:2], collapse = "-")
         } else {
         x[1]
         }
     } else {
         x[1]
     }
     })
    
     # Extract treatment type
     metadata$treatment <- sapply(samples, function(x) {
         if(grepl("_ctr", x)) return("ctrl")
         if(grepl("Diclo", x)) return("Diclo")
         if(grepl("Mero", x)) return("Mero")
         if(grepl("Azi", x)) return("Azi")
         if(grepl("Rifampicin", x)) return("Rifampicin")
         return("ctrl")
     })
    
     # Extract concentration
     metadata$concentration <- sapply(samples, function(x) {
         if(grepl("Diclo1250", x)) return("1250")
         if(grepl("Diclo750", x)) return("750")
         if(grepl("Diclo375", x)) return("375")
         if(grepl("Mero0.5", x)) return("0.5")
         if(grepl("Mero0.35", x)) return("0.35")
         if(grepl("Mero0.25", x)) return("0.25")
         if(grepl("Mero0.15", x)) return("0.15")
         if(grepl("Azi20", x)) return("20")
         if(grepl("Rifampicin2", x)) return("2")
         if(grepl("Rifampicin1.5", x)) return("1.5")
         return("0")
     })
    
     # Extract condition (solid vs liquid)
     metadata$condition <- ifelse(grepl("_solid", samples), "solid", "liquid")
    
     # Extract replicate
     metadata$replicate <- sapply(strsplit(samples, "_"), function(x) {
     rep_part <- x[length(x)]
     gsub("r", "", rep_part)
     })
    
     # Create combined group for easy comparisons
     metadata$group <- paste(metadata$strain, metadata$treatment, metadata$concentration, sep = "_")
    
     # Set row names
     rownames(metadata) <- metadata$sample
    
     # Reorder to match txi columns
     metadata <- metadata[colnames(txi$counts), ]
    
     # ---------------------------------------------
     # ---- Step 2: Choose Your Design Strategy ----
    
     # Strategy A: Full Factorial Design (if balanced)
     #dds <- DESeqDataSetFromTximport(txi, metadata,
     #                         design = ~ strain + treatment + condition)
    
     # --> Strategy B: Combined Group Factor ⭐ RECOMMENDED
     metadata$group <- factor(paste(metadata$strain,
                                     metadata$treatment,
                                     metadata$concentration,
                                     metadata$condition,
                                     sep = "_"))
    
     dds <- DESeqDataSetFromTximport(txi, metadata,
                                     design = ~ group)
     dds <- DESeq(dds)
    
     # See all available comparisons
     resultsNames(dds)
    
     # -------------------------------------------------------------
     # ---- Step 3: Set Up Specific Comparisons from Your Notes ----
     # ==========================================
     # 1. Define Exact Comparisons from Your Notes
     # ==========================================
     planned_comparisons <- list(
     # --- Baseline / Strain Controls ---
     AYE_T_ctr_vs_AYE_WT_ctr            = list(treat = "AYE-T_ctrl_0_liquid",   ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_O_ctr_vs_AYE_WT_ctr            = list(treat = "AYE-O_ctrl_0_liquid",   ctrl = "AYE-WT_ctrl_0_liquid"),
     O_Trans_ctr_vs_AYE_WT_ctr          = list(treat = "O-Trans_ctrl_0_liquid", ctrl = "AYE-WT_ctrl_0_liquid"),
     WT_Trans_ctr_vs_AYE_WT_ctr         = list(treat = "WT-Trans_ctrl_0_liquid",ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_O_ctr_vs_AYE_T                 = list(treat = "AYE-O_ctrl_0_liquid",   ctrl = "AYE-T_ctrl_0_liquid"),
     O_Trans_ctr_vs_AYE_T               = list(treat = "O-Trans_ctrl_0_liquid", ctrl = "AYE-T_ctrl_0_liquid"),
     WT_Trans_ctr_vs_AYE_T              = list(treat = "WT-Trans_ctrl_0_liquid",ctrl = "AYE-T_ctrl_0_liquid"),
    
     # --- Condition Effects (Solid vs Liquid) ---
     AYE_WT_ctr_solid_vs_AYE_WT_ctr     = list(treat = "AYE-WT_ctrl_0_solid",   ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_O_ctr_solid_vs_AYE_O_ctr       = list(treat = "AYE-O_ctrl_0_solid",    ctrl = "AYE-O_ctrl_0_liquid"),
     AYE_T_ctr_solid_vs_AYE_T_ctr       = list(treat = "AYE-T_ctrl_0_solid",    ctrl = "AYE-T_ctrl_0_liquid"),
     AYE_O_ctr_solid_vs_AYE_WT_ctr_solid= list(treat = "AYE-O_ctrl_0_solid",    ctrl = "AYE-WT_ctrl_0_solid"),
     AYE_T_ctr_solid_vs_AYE_WT_ctr_solid= list(treat = "AYE-T_ctrl_0_solid",    ctrl = "AYE-WT_ctrl_0_solid"),
    
     # --- Diclofenac ---
     AYE_WT_Diclo750_vs_AYE_WT_ctr      = list(treat = "AYE-WT_Diclo_750_liquid",   ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_T_Diclo375_vs_AYE_WT_ctr       = list(treat = "AYE-T_Diclo_375_liquid",    ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_O_Diclo375_vs_AYE_WT_ctr       = list(treat = "AYE-O_Diclo_375_liquid",    ctrl = "AYE-WT_ctrl_0_liquid"),
     O_Trans_Diclo375_vs_AYE_WT_ctr     = list(treat = "O-Trans_Diclo_375_liquid",  ctrl = "AYE-WT_ctrl_0_liquid"),
     WT_Trans_Diclo750_vs_AYE_WT_ctr    = list(treat = "WT-Trans_Diclo_750_liquid", ctrl = "AYE-WT_ctrl_0_liquid"),
     Diclo_AYE_WT_1250_solid_vs_solid_ctr = list(treat = "AYE-WT_Diclo_1250_solid", ctrl = "AYE-WT_ctrl_0_solid"),
    
     # --- Meropenem ---
     AYE_WT_Mero_vs_AYE_WT_ctr          = list(treat = "AYE-WT_Mero_0.35_liquid", ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_T_Mero_vs_AYE_WT_ctr           = list(treat = "AYE-T_Mero_0.15_liquid",      ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_O_Mero_vs_AYE_WT_ctr           = list(treat = "AYE-O_Mero_0.5_liquid",       ctrl = "AYE-WT_ctrl_0_liquid"),
     O_Trans_Mero_vs_AYE_WT_ctr         = list(treat = "O-Trans_Mero_0.25_liquid",    ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_T_Mero_vs_AYE_T_ctr            = list(treat = "AYE-T_Mero_0.15_liquid",      ctrl = "AYE-T_ctrl_0_liquid"),
    
     # --- Azithromycin (Solid) ---
     AYE_WT_Azi_vs_solid_ctr            = list(treat = "AYE-WT_Azi_20_solid", ctrl = "AYE-WT_ctrl_0_solid"),
     AYE_T_Azi_vs_solid_ctr             = list(treat = "AYE-T_Azi_20_solid",  ctrl = "AYE-T_ctrl_0_solid"),
     AYE_O_Azi_vs_solid_ctr             = list(treat = "AYE-O_Azi_20_solid",  ctrl = "AYE-O_ctrl_0_solid"),
     F_Azi_vs_F_solid_ctr               = list(treat = "F_Azi_20_solid",      ctrl = "F_ctrl_0_solid"),
    
     # --- Rifampicin ---
     AYE_WT_Rif_vs_AYE_WT_ctr           = list(treat = "AYE-WT_Rifampicin_1.5_liquid", ctrl = "AYE-WT_ctrl_0_liquid"),
     AYE_T_Rif_vs_AYE_T_ctr             = list(treat = "AYE-T_Rifampicin_2_liquid",    ctrl = "AYE-T_ctrl_0_liquid"),
     AYE_O_Rif_vs_AYE_O_ctr             = list(treat = "AYE-O_Rifampicin_2_liquid",    ctrl = "AYE-O_ctrl_0_liquid"),
     O_Trans_Rif_vs_O_Trans_ctr         = list(treat = "O-Trans_Rifampicin_2_liquid",  ctrl = "O-Trans_ctrl_0_liquid")
     )
    
     # Additional Analyses
     planned_comparisons <- list(
     # --- Diclofenac_2 ---
     # * AYE-T_Diclo_375_liquid vs AYE-T_ctrl_0_liquid
     # * AYE-O_Diclo_375_liquid vs AYE-O_ctrl_0_liquid
     # * O-Trans_Diclo_375_liquid vs AYE-O-Trans_ctrl_0_liquid
     # * WT-Trans_Diclo_750_liquid vs AYE-WT-Trans_ctrl_0_liquid
     AYE_T_Diclo375_vs_AYE_T_ctr        = list(treat = "AYE-T_Diclo_375_liquid",        ctrl = "AYE-T_ctrl_0_liquid"),
     AYE_O_Diclo375_vs_AYE_O_ctr        = list(treat = "AYE-O_Diclo_375_liquid",        ctrl = "AYE-O_ctrl_0_liquid"),
     O_Trans_Diclo375_vs_O_Trans_ctr    = list(treat = "O-Trans_Diclo_375_liquid",  ctrl = "O-Trans_ctrl_0_liquid"),
     WT_Trans_Diclo750_vs_WT_Trans_ctr    = list(treat = "WT-Trans_Diclo_750_liquid", ctrl = "WT-Trans_ctrl_0_liquid"),
    
     # --- Meropenem_2 ---
     # * AYE-T_Mero_0.15_liquid vs AYE-T_ctrl_0_liquid
     # * AYE-O_Mero_0.5_liquid vs AYE-O_ctrl_0_liquid
     # * O-Trans_Mero_0.25_liquid vs AYE-O-Trans_ctrl_0_liquid
     AYE_T_Mero_vs_AYE_T_ctr        = list(treat = "AYE-T_Mero_0.15_liquid",    ctrl = "AYE-T_ctrl_0_liquid"),
     AYE_O_Mero_vs_AYE_O_ctr        = list(treat = "AYE-O_Mero_0.5_liquid",     ctrl = "AYE-O_ctrl_0_liquid"),
     O_Trans_Mero_vs_O_Trans_ctr    = list(treat = "O-Trans_Mero_0.25_liquid",  ctrl = "O-Trans_ctrl_0_liquid")
     )
    
     # ==========================================
     # 2. Verification & Validation Script
     # ==========================================
     # Identify which column in colData holds your group names
     group_col <- if("group" %in% colnames(colData(dds))) "group" else
                 if("treatment" %in% colnames(colData(dds))) "treatment" else
                 stop("❌ Please specify the correct colData column containing group names.")
    
     actual_groups <- unique(colData(dds)[[group_col]])
    
     cat("\n", paste(rep("=", 85), collapse=""), "\n")
     cat("📋 VERIFICATION OF NOTE-DERIVED COMPARISONS\n")
     cat(paste(rep("=", 85), collapse=""), "\n\n")
    
     validation_results <- data.frame(
     Comparison_Name = character(),
     Treatment_String = character(),
     Control_String = character(),
     Status = character(),
     Suggested_Contrast = character(),
     stringsAsFactors = FALSE
     )
    
     for(name in names(planned_comparisons)) {
     trt <- planned_comparisons[[name]]$treat
     ctl <- planned_comparisons[[name]]$ctrl
    
     # Find closest matches in actual data
     trt_match <- actual_groups[grepl(trt, actual_groups, fixed = TRUE)]
     ctl_match <- actual_groups[grepl(ctl, actual_groups, fixed = TRUE)]
    
     status <- if(length(trt_match) > 0 && length(ctl_match) > 0) "✅ VALID" else "⚠️  CHECK"
     contrast_str <- if(status == "✅ VALID")
         paste0('c("', group_col, '", "', trt_match[1], '", "', ctl_match[1], '")') else "N/A"
    
     validation_results <- rbind(validation_results, data.frame(
         Comparison_Name = name,
         Treatment_String = trt,
         Control_String = ctl,
         Status = status,
         Suggested_Contrast = contrast_str,
         stringsAsFactors = FALSE
     ))
    
     cat(sprintf("%-45s | T:%-25s C:%-20s | %s\n", name, trt, ctl, status))
     if(status == "⚠️  CHECK") {
         if(length(trt_match) == 0) cat("   🔍 Treat not found. Closest: ", paste(head(actual_groups[grepl(strsplit(trt, "_")[[1]][1], actual_groups)], 3), collapse=", "), "\n")
         if(length(ctl_match) == 0) cat("   🔍 Ctrl not found.  Closest: ", paste(head(actual_groups[grepl(strsplit(ctl, "_")[[1]][1], actual_groups)], 3), collapse=", "), "\n")
     }
     }
    
     # ==========================================
     # 3. Auto-Generate DESeq2 results() Calls (Optional)
     # ==========================================
     valid_comparisons <- validation_results[validation_results$Status == "✅ VALID", ]
     if(nrow(valid_comparisons) > 0) {
     cat("\n📜 READY-TO-RUN DESeq2 CONTRASTS:\n")
     cat(paste(rep("-", 60), collapse=""), "\n")
     for(i in seq_len(nrow(valid_comparisons))) {
         cat(sprintf('res_%s <- results(dds, contrast = %s)\n',
                     gsub("[^A-Za-z0-9]", "_", valid_comparisons$Comparison_Name[i]),
                     valid_comparisons$Suggested_Contrast[i]))
     }
     } else {
     cat("\n⚠️  No exact matches found. Check your colData group naming convention.\n")
     }
    
     # -----------------------------
     # ---- Step 4: PCA figures ----
    
     # 🔍 What each figure shows:
     #
     #    01_PCA_by_Strain.png → Tests if genetic background (AYE-WT, AYE-T, AYE-O, Trans, F) is the dominant source of variation.
     #    02_PCA_by_Treatment.png → Shows clustering by antibiotic/drug exposure (ctrl, Diclo, Mero, Azi, Rifampicin).
     #    03_PCA_by_Condition.png → Reveals batch/growth media effects (solid vs liquid).
     #    04_PCA_CombinedGroups.png → Full experimental grouping with labeled sample names for quick outlier detection.
     #    05_PCA_Ellipses.png → Adds 95% confidence boundaries per strain to visualize group spread and overlap.
     #
     # ⚠️ Quick Checklist Before Running:
     #
     #    Ensure metadata columns (strain, treatment, condition, group) are attached to colData(dds).
     #    If ggrepel is missing, run install.packages("ggrepel").
     #    All PNGs will save to your current working directory (getwd()).
    
     # Install if missing: install.packages(c("ggplot2", "ggrepel"))
     library(DESeq2)
     library(ggplot2)
     library(ggrepel)
    
     # 1. Variance Stabilizing Transformation & Extract PCA Data
     vsd <- vst(dds, blind = FALSE)
     pca_data <- plotPCA(vsd, intgroup = c("strain", "treatment", "condition", "group"), returnData = TRUE)
     percent_var <- round(100 * attr(pca_data, "percentVar"))
    
     # Consistent theme for all plots
     base_theme <- theme_bw(base_size = 12) +
     theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13),
             legend.position = "right",
             legend.title = element_text(face = "bold"),
             panel.grid.major = element_line(color = "grey90"),
             panel.grid.minor = element_blank())
    
     # --- Plot 1: Colored by Strain ---
     p1 <- ggplot(pca_data, aes(x = PC1, y = PC2, color = strain, shape = condition)) +
     geom_point(size = 3, alpha = 0.8) +
     geom_text_repel(aes(label = name), size = 2.5, max.overlaps = 20, show.legend = FALSE) +
     labs(x = paste0("PC1: ", percent_var[1], "% variance"),
         y = paste0("PC2: ", percent_var[2], "% variance"),
         title = "PCA: Samples Colored by Strain",
         color = "Strain", shape = "Condition") +
     base_theme
     ggsave("01_PCA_by_Strain.png", p1, width = 8, height = 6, dpi = 300)
    
     # --- Plot 2: Colored by Treatment ---
     p2 <- ggplot(pca_data, aes(x = PC1, y = PC2, color = treatment, shape = condition)) +
     geom_point(size = 3, alpha = 0.8) +
     labs(x = paste0("PC1: ", percent_var[1], "% variance"),
         y = paste0("PC2: ", percent_var[2], "% variance"),
         title = "PCA: Samples Colored by Treatment",
         color = "Treatment", shape = "Condition") +
     base_theme
     ggsave("02_PCA_by_Treatment.png", p2, width = 8, height = 6, dpi = 300)
    
     # --- Plot 3: Colored by Condition (Solid vs Liquid) ---
     p3 <- ggplot(pca_data, aes(x = PC1, y = PC2, color = condition, shape = strain)) +
     geom_point(size = 3, alpha = 0.8) +
     labs(x = paste0("PC1: ", percent_var[1], "% variance"),
         y = paste0("PC2: ", percent_var[2], "% variance"),
         title = "PCA: Samples Colored by Growth Condition",
         color = "Condition", shape = "Strain") +
     base_theme
     ggsave("03_PCA_by_Condition.png", p3, width = 8, height = 6, dpi = 300)
    
     # --- Plot 4: Combined Groups with Sample Labels ---
     p4 <- ggplot(pca_data, aes(x = PC1, y = PC2, color = group)) +
     geom_point(size = 3, alpha = 0.8) +
     geom_text_repel(aes(label = name), size = 2, max.overlaps = 30, box.padding = 0.3) +
     labs(x = paste0("PC1: ", percent_var[1], "% variance"),
         y = paste0("PC2: ", percent_var[2], "% variance"),
         title = "PCA: Combined Experimental Groups",
         color = "Group") +
     base_theme + theme(legend.position = "none")
     ggsave("04_PCA_CombinedGroups.png", p4, width = 9, height = 7, dpi = 300)
    
     # --- Plot 5: 95% Confidence Ellipses (by Strain) ---
     p5 <- ggplot(pca_data, aes(x = PC1, y = PC2, color = strain, fill = strain)) +
     geom_point(size = 3, alpha = 0.7) +
     stat_ellipse(level = 0.95, alpha = 0.2, geom = "polygon", show.legend = FALSE) +
     labs(x = paste0("PC1: ", percent_var[1], "% variance"),
         y = paste0("PC2: ", percent_var[2], "% variance"),
         title = "PCA: 95% Confidence Ellipses by Strain",
         color = "Strain", fill = "Strain") +
     base_theme
     ggsave("05_PCA_Ellipses.png", p5, width = 8, height = 6, dpi = 300)
    
     message("✅ All 5 PCA plots saved to working directory!")
  8. Run Differential Expression & PCA Analysis Complete

     (r_env) cd ~/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon/
     #(r_env) Rscript complete_deg_pipeline.R  #For standard cutoff in the project
    
     # Adapted the script to the following requests:
     # (a) Rifampicin: use genes with a cutoff of log2 fold change > 1.2 and < -1.2 for the KEGG and GO analyses.
     # (b) Baseline / Strain Controls: use genes with a cutoff of log2 fold change > 1.4 and < -1.4 for the KEGG and GO analyses.
     # (c) All other comparisons: please retain the same selection criteria as in the previous analysis you sent to me.
    
     # How it works:
     #   * Rifampicin: The script looks for "Rif" in the comparison name (e.g., 28_AYE_WT_Rif_vs_Ctrl) and applies |log2FC| >= 1.2.
     #   * Baseline/Strain Controls: The script looks for "_ctr_vs_" in the comparison name (e.g., 01_AYE_T_ctr_vs_AYE_WT_ctr) and applies |log2FC| >= 1.4.
     #   * All Others: Falls back to the original 2.0 cutoff.
     #   * The console output will now explicitly print which cutoff is being used for each specific comparison.
    
     (r_env) Rscript complete_deg_pipeline_custom_cutoff.R
  9. LOG of Rscript complete_deg_pipeline_custom_cutoff.R

     -rw-rw-r-- 1 jhuang jhuang 355609 Jun 22 12:53 DEG_29_AYE_T_Rif_vs_Ctrl.xlsx
     -rw-rw-r-- 1 jhuang jhuang 253271 Jun 22 12:53 Volcano_29_AYE_T_Rif_vs_Ctrl.png
     -rw-rw-r-- 1 jhuang jhuang 368843 Jun 22 12:53 DEG_30_AYE_O_Rif_vs_Ctrl.xlsx
     -rw-rw-r-- 1 jhuang jhuang 358124 Jun 22 12:53 Volcano_30_AYE_O_Rif_vs_Ctrl.png
     -rw-rw-r-- 1 jhuang jhuang 347126 Jun 22 12:53 DEG_31_O_Trans_Rif_vs_Ctrl.xlsx
     -rw-rw-r-- 1 jhuang jhuang 283473 Jun 22 12:53 Volcano_31_O_Trans_Rif_vs_Ctrl.png
    
     -rw-rw-r-- 1 jhuang jhuang 352375 Jun 22 12:53 DEG_32_AYE_T_Diclo375_vs_AYE_T_ctr.xlsx
     -rw-rw-r-- 1 jhuang jhuang 257075 Jun 22 12:53 Volcano_32_AYE_T_Diclo375_vs_AYE_T_ctr.png
    
     -rw-rw-r-- 1 jhuang jhuang  355610 Jun  5 16:26 DEG_29_AYE_T_Rif_vs_Ctrl.xlsx
     -rw-rw-r-- 1 jhuang jhuang  254804 Jun  5 16:26 Volcano_29_AYE_T_Rif_vs_Ctrl.png
     -rw-rw-r-- 1 jhuang jhuang  368843 Jun  5 16:26 DEG_30_AYE_O_Rif_vs_Ctrl.xlsx
     -rw-rw-r-- 1 jhuang jhuang  358417 Jun  5 16:26 Volcano_30_AYE_O_Rif_vs_Ctrl.png
     -rw-rw-r-- 1 jhuang jhuang  347127 Jun  5 16:26 DEG_31_O_Trans_Rif_vs_Ctrl.xlsx
     -rw-rw-r-- 1 jhuang jhuang  284084 Jun  5 16:26 Volcano_31_O_Trans_Rif_vs_Ctrl.png
     -rw-rw-r-- 1 jhuang jhuang    1579 Jun  5 16:26 DEG_Summary_All_31.csv
    
     (r_env) jhuang@WS-2290C:/mnt/md1/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon$ Rscript complete_deg_pipeline_custom_cutoff.R
     There were 22 warnings (use warnings() to see them)
     📖 Parsing annotation file to build tx2gene mapping...
     ✅ Created tx2gene mapping with 7520 entries.
     📥 Importing Salmon quantifications...
     reading in files with read_tsv
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
     removing duplicated transcript rows from tx2gene
     summarizing abundance
     summarizing counts
     summarizing length
       Note: levels of factors in the design contain characters other than
       letters, numbers, '_' and '.'. It is recommended (but not required) to use
       only letters, numbers, and delimiters '_' or '.', as these are safe characters
       for column names in R. [This is a message, not a warning or an error]
     using counts and average transcript lengths from tximport
     🚀 Running DESeq2 pipeline...
     estimating size factors
       Note: levels of factors in the design contain characters other than
       letters, numbers, '_' and '.'. It is recommended (but not required) to use
       only letters, numbers, and delimiters '_' or '.', as these are safe characters
       for column names in R. [This is a message, not a warning or an error]
     using 'avgTxLength' from assays(dds), correcting for library size
     estimating dispersions
     gene-wise dispersion estimates
     mean-dispersion relationship
       Note: levels of factors in the design contain characters other than
       letters, numbers, '_' and '.'. It is recommended (but not required) to use
       only letters, numbers, and delimiters '_' or '.', as these are safe characters
       for column names in R. [This is a message, not a warning or an error]
     final dispersion estimates
     fitting model and testing
     Available groups in dds:
     [1] "AYE-O_Azi_20_solid"           "AYE-O_ctrl_0_liquid"
     [3] "AYE-O_ctrl_0_solid"           "AYE-O_Diclo_375_liquid"
     [5] "AYE-O_Mero_0.5_liquid"        "AYE-O_Rifampicin_2_liquid"
     [7] "AYE-T_Azi_20_solid"           "AYE-T_ctrl_0_liquid"
     [9] "AYE-T_ctrl_0_solid"           "AYE-T_Diclo_375_liquid"
     [11] "AYE-T_Mero_0.15_liquid"       "AYE-T_Rifampicin_2_liquid"
     [13] "AYE-WT_Azi_20_solid"          "AYE-WT_ctrl_0_liquid"
     [15] "AYE-WT_ctrl_0_solid"          "AYE-WT_Diclo_1250_solid"
     [17] "AYE-WT_Diclo_750_liquid"      "AYE-WT_Mero_0.35-0.5_liquid"
     [19] "AYE-WT_Rifampicin_1.5_liquid" "F_Azi_20_solid"
     [21] "F_ctrl_0_solid"               "O-Trans_ctrl_0_liquid"
     [23] "O-Trans_Diclo_375_liquid"     "O-Trans_Mero_0.25_liquid"
     [25] "O-Trans_Rifampicin_2_liquid"  "WT-Trans_ctrl_0_liquid"
     [27] "WT-Trans_Diclo_750_liquid"
     📖 Parsing annotation file for gene names...
    
     🚀 PROCESSING  38  COMPARISONS
     Base Thresholds: padj <= 0.05, |log2FC| >= 2 (Dynamic adjustments apply per comparison)
     ================================================================================
    
     [01/38] 01_AYE_T_ctr_vs_AYE_WT_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 16, Down: 151)
    
     [02/38] 02_AYE_O_ctr_vs_AYE_WT_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 9, Down: 9)
    
     [03/38] 03_O_Trans_ctr_vs_AYE_WT_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 10, Down: 44)
    
     [04/38] 04_WT_Trans_ctr_vs_AYE_WT_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 15, Down: 153)
    
     [05/38] 05_AYE_O_ctr_vs_AYE_T_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 28, Down: 3)
    
     [06/38] 06_O_Trans_ctr_vs_AYE_T_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 11, Down: 10)
    
     [07/38] 07_WT_Trans_ctr_vs_AYE_T_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 1.4
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 4, Down: 14)
    
     [08/38] 08_AYE_WT_ctr_solid_vs_liquid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 485, Down: 324)
    
     [09/38] 09_AYE_O_ctr_solid_vs_liquid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 475, Down: 309)
    
     [10/38] 10_AYE_T_ctr_solid_vs_liquid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 501, Down: 332)
    
     [11/38] 11_AYE_O_ctr_solid_vs_AYE_WT_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 12, Down: 5)
    
     [12/38] 12_AYE_T_ctr_solid_vs_AYE_WT_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 5, Down: 8)
    
     [13/38] 13_AYE_WT_Diclo750_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 12, Down: 45)
    
     [14/38] 14_AYE_T_Diclo375_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 75, Down: 181)
    
     [15/38] 15_AYE_O_Diclo375_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 26, Down: 240)
    
     [16/38] 16_O_Trans_Diclo375_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 17, Down: 82)
    
     [17/38] 17_WT_Trans_Diclo750_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 25, Down: 230)
    
     [18/38] 18_AYE_WT_Diclo1250_solid_vs_Ctrl_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 66, Down: 26)
    
     [19/38] 19_AYE_WT_Mero_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 359, Down: 175)
    
     [20/38] 20_AYE_T_Mero_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 93, Down: 110)
    
     [21/38] 21_AYE_O_Mero_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 267, Down: 244)
    
     [22/38] 22_O_Trans_Mero_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 183, Down: 153)
    
     [23/38] 23_AYE_T_Mero_vs_AYE_T_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 91, Down: 27)
    
     [24/38] 24_AYE_WT_Azi_solid_vs_Ctrl_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 341, Down: 329)
    
     [25/38] 25_AYE_T_Azi_solid_vs_Ctrl_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 510, Down: 433)
    
     [26/38] 26_AYE_O_Azi_solid_vs_Ctrl_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 484, Down: 443)
    
     [27/38] 27_F_Azi_solid_vs_Ctrl_solid
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 147, Down: 271)
    
     [28/38] 28_AYE_WT_Rif_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 1.2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 8, Down: 11)
    
     [29/38] 29_AYE_T_Rif_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 1.2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 34, Down: 41)
    
     [30/38] 30_AYE_O_Rif_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 1.2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 71, Down: 100)
    
     [31/38] 31_O_Trans_Rif_vs_Ctrl
       -> Using dynamic LFC cutoff: |log2FC| >= 1.2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 6, Down: 10)
    
     [32/38] 32_AYE_T_Diclo375_vs_AYE_T_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 47, Down: 6)
    
     [33/38] 33_AYE_O_Diclo375_vs_AYE_O_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 8, Down: 73)
    
     [34/38] 34_O_Trans_Diclo375_vs_O_Trans_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 2, Down: 17)
    
     [35/38] 35_WT_Trans_Diclo750_vs_WT_Trans_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 10, Down: 27)
    
     [36/38] 36_AYE_T_Mero_vs_AYE_T_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 91, Down: 27)
    
     [37/38] 37_AYE_O_Mero_vs_AYE_O_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 242, Down: 135)
    
     [38/38] 38_O_Trans_Mero_vs_O_Trans_ctr
       -> Using dynamic LFC cutoff: |log2FC| >= 2
    
     🔍 DEBUG: Columns in res_df  : baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean
     🔍 DEBUG: Columns in anno_df : gene_id, gene_id_clean, gene_name
     🔍 DEBUG: Columns after join:  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, GeneID, GeneID_clean, gene_id, gene_name
       ✅ Annotation step completed successfully.
       ✅ Excel + Volcano saved (Up: 177, Down: 50)
    
     ================================================================================
     📊 FINAL SUMMARY OF ALL 38 COMPARISONS
                                         name total  up down sig_total pct_sig
               # |log2FC| >= 1.4
                   01_AYE_T_ctr_vs_AYE_WT_ctr  3609  16  151       167     4.6
                   02_AYE_O_ctr_vs_AYE_WT_ctr  3609   9    9        18     0.5
                  03_O_Trans_ctr_vs_AYE_WT_ctr  3609  10   44        54     1.5
               04_WT_Trans_ctr_vs_AYE_WT_ctr  3609  15  153       168     4.7
                   05_AYE_O_ctr_vs_AYE_T_ctr  3609  28    3        31     0.9
                 06_O_Trans_ctr_vs_AYE_T_ctr  3609  11   10        21     0.6
                 07_WT_Trans_ctr_vs_AYE_T_ctr  3609   4   14        18     0.5
    
               # |log2FC| >= 2
               08_AYE_WT_ctr_solid_vs_liquid  3609 485  324       809    22.4
                 09_AYE_O_ctr_solid_vs_liquid  3609 475  309       784    21.7
                 10_AYE_T_ctr_solid_vs_liquid  3609 501  332       833    23.1
           11_AYE_O_ctr_solid_vs_AYE_WT_solid  3609  12    5        17     0.5
           12_AYE_T_ctr_solid_vs_AYE_WT_solid  3609   5    8        13     0.4
                   13_AYE_WT_Diclo750_vs_Ctrl  3609  12   45        57     1.6
                   14_AYE_T_Diclo375_vs_Ctrl  3609  75  181       256     7.1
                   15_AYE_O_Diclo375_vs_Ctrl  3609  26  240       266     7.4
                 16_O_Trans_Diclo375_vs_Ctrl  3609  17   82        99     2.7
                 17_WT_Trans_Diclo750_vs_Ctrl  3609  25  230       255     7.1
     18_AYE_WT_Diclo1250_solid_vs_Ctrl_solid  3609  66   26        92     2.5
                       19_AYE_WT_Mero_vs_Ctrl  3609 359  175       534    14.8
                       20_AYE_T_Mero_vs_Ctrl  3609  93  110       203     5.6
                       21_AYE_O_Mero_vs_Ctrl  3609 267  244       511    14.2
                     22_O_Trans_Mero_vs_Ctrl  3609 183  153       336     9.3
                 23_AYE_T_Mero_vs_AYE_T_Ctrl  3609  91   27       118     3.3
           24_AYE_WT_Azi_solid_vs_Ctrl_solid  3609 341  329       670    18.6
             25_AYE_T_Azi_solid_vs_Ctrl_solid  3609 510  433       943    26.1
             26_AYE_O_Azi_solid_vs_Ctrl_solid  3609 484  443       927    25.7
                 27_F_Azi_solid_vs_Ctrl_solid  3609 147  271       418    11.6
    
               # |log2FC| >= 1.2
                       28_AYE_WT_Rif_vs_Ctrl  3609   8   11        19     0.5
                         29_AYE_T_Rif_vs_Ctrl  3609  34   41        75     2.1
                         30_AYE_O_Rif_vs_Ctrl  3609  71  100       171     4.7
                       31_O_Trans_Rif_vs_Ctrl  3609   6   10        16     0.4
    
               # |log2FC| >= 2
                                         name total  up down sig_total
               32_AYE_T_Diclo375_vs_AYE_T_ctr  3609  46    6        52
               33_AYE_O_Diclo375_vs_AYE_O_ctr  3609   8   73        81     2.2
           34_O_Trans_Diclo375_vs_O_Trans_ctr  3609   2   17        19     0.5
         35_WT_Trans_Diclo750_vs_WT_Trans_ctr  3609  10   27        37     1.0
                   36_AYE_T_Mero_vs_AYE_T_ctr  3609  91   27       118     3.3
                   37_AYE_O_Mero_vs_AYE_O_ctr  3609 242  135       377    10.4
               38_O_Trans_Mero_vs_O_Trans_ctr  3609 177   50       227     6.3
    
     ✨ All files saved to: /mnt/md1/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon/DEG_Results_Complete
     📁 Each comparison contains:
         1️ DEG_
    .xlsx (3 sheets: Complete_Results, Up_Regulated, Down_Regulated) 2️ Volcano_ .png (GeneName labels for top significant genes) 📤 EXPORTING COMPLETE RESULTS TO CSV FOR KEGG/GO… ✅ Successfully exported 38 CSV files to DEG_Results_Complete
  10. KEGG and GO annotations in non-model organisms

    (a) Rifampicin: use genes with a cutoff of log2 fold change > 1.2 and 1.4 and < -1.4 for the KEGG and GO analyses. (c) All other comparisons: please retain the same selection criteria as in the previous analysis you sent to me.

https://www.biobam.com/functional-analysis/

10.1. Assign KEGG and GO Terms (see diagram above)

Since your organism is non-model, standard R databases (org.Hs.eg.db, etc.) won’t work. You’ll need to manually retrieve KEGG and GO annotations.

* Preparing file 1 eggnog_out.emapper.annotations.txt for the R-code below: (KEGG Terms): EggNog based on orthology and phylogenies

    EggNOG-mapper assigns both KEGG Orthology (KO) IDs and GO terms.

    Install EggNOG-mapper:

        mamba create -n eggnog_env python=3.8 eggnog-mapper -c conda-forge -c bioconda  #eggnog-mapper_2.1.12
        mamba activate eggnog_env

    Run annotation:

        #diamond makedb --in eggnog6.prots.faa -d eggnog_proteins.dmnd
        mkdir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
        download_eggnog_data.py --dbname eggnog.db -y --data_dir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
        #NOT_WORKING: emapper.py -i CP059040_gene.fasta -o eggnog_dmnd_out --cpu 60 -m diamond[hmmer,mmseqs] --dmnd_db /home/jhuang/REFs/eggnog_data/data/eggnog_proteins.dmnd
        #Download CU459141_protein_.fasta from NCBI
        python ~/Scripts/update_fasta_header.py CU459141_protein_.fasta CU459141_protein.fasta
        emapper.py -i CU459141_protein.fasta -o eggnog_out --cpu 60 --resume
        #----> result annotations.tsv: Contains KEGG, GO, and other functional annotations.
        #---->  470.IX87_14445:
            * 470 likely refers to the organism or strain (e.g., Acinetobacter baumannii ATCC 19606 or another related strain).
            * IX87_14445 would refer to a specific gene or protein within that genome.

    Extract KEGG KO IDs from annotations.emapper.annotations.

* Preparing file 2 blast2go_annot.annot2_ for the R-code below:

  - Basic (GO Terms from 'Blast2GO 5 Basic', saved in blast2go_annot.annot): Using Blast/Diamond + Blast2GO_GUI based on sequence alignment + GO mapping

    * 'Load protein sequences' (Tags: NONE, generated columns: Nr, SeqName) -->
    * Buttons 'blast' (Tags: BLASTED, generated columns: Description, Length, #Hits, e-Value, sim mean),
    * Button 'mapping' (Tags: MAPPED, generated columns: #GO, GO IDs, GO Names), "Mapping finished - Please proceed now to annotation."
    * Button 'annot' (Tags: ANNOTATED, generated columns: Enzyme Codes, Enzyme Names), "Annotation finished."
            * Used parameter 'Annotation CutOff': The Blast2GO Annotation Rule seeks to find the most specific GO annotations with a certain level of reliability. An annotation score is calculated for each candidate GO which is composed by the sequence similarity of the Blast Hit, the evidence code of the source GO and the position of the particular GO in the Gene Ontology hierarchy. This annotation score cutoff select the most specific GO term for a given GO branch which lies above this value.
            * Used parameter 'GO Weight' is a value which is added to Annotation Score of a more general/abstract Gene Ontology term for each of its more specific, original source GO terms. In this case, more general GO terms which summarise many original source terms (those ones directly associated to the Blast Hits) will have a higher Annotation Score.

  - Advanced (GO Terms from 'Blast2GO 5 Basic'): Interpro based protein families / domains --> Button interpro

    * Button 'interpro' (Tags: INTERPRO, generated columns: InterPro IDs, InterPro GO IDs, InterPro GO Names) --> "InterProScan Finished - You can now merge the obtained GO Annotations."

  - MERGE the results of InterPro GO IDs (advanced) to GO IDs (basic) and generate final GO IDs, saved in blast2go_annot.annot2

    * Button 'interpro'/'Merge InterProScan GOs to Annotation' --> "Merge (add and validate) all GO terms retrieved via InterProScan to the already existing GO annotation." --> "Finished merging GO terms from InterPro with annotations. Maybe you want to run ANNEX (Annotation Augmentation)."
    * (NOT_USED) Button 'annot'/'ANNEX' --> "ANNEX finished. Maybe you want to do the next step: Enzyme Code Mapping."

  - PREPARING go_terms and ec_terms: annot_* file (NOTE that blast2go_annot.annot2 is after merging InterPro_GO_IDs and GO_IDs):

    cut -f1-2 -d$'\t' blast2go_annot.annot2 > blast2go_annot.annot2_

10.2. Perform KEGG and GO Enrichment in R

      (r_env) cd /mnt/md1/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon/DEG_Results_Complete

      #For |deg_cutoff_log_foldchange| >=1.4
      sed "s/01_AYE_T_ctr_vs_AYE_WT_ctr/02_AYE_O_ctr_vs_AYE_WT_ctr/g" 1.R > 2.R
      ...

      #For |deg_cutoff_log_foldchange| >=2.0
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/09_AYE_O_ctr_solid_vs_liquid/g" 8.R > 9.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/10_AYE_T_ctr_solid_vs_liquid/g" 8.R > 10.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/11_AYE_O_ctr_solid_vs_AYE_WT_solid/g" 8.R > 11.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/12_AYE_T_ctr_solid_vs_AYE_WT_solid/g" 8.R > 12.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/13_AYE_WT_Diclo750_vs_Ctrl/g" 8.R > 13.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/14_AYE_T_Diclo375_vs_Ctrl/g" 8.R > 14.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/15_AYE_O_Diclo375_vs_Ctrl/g" 8.R > 15.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/16_O_Trans_Diclo375_vs_Ctrl/g" 8.R > 16.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/17_WT_Trans_Diclo750_vs_Ctrl/g" 8.R > 17.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/18_AYE_WT_Diclo1250_solid_vs_Ctrl_solid/g" 8.R > 18.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/19_AYE_WT_Mero_vs_Ctrl/g" 8.R > 19.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/20_AYE_T_Mero_vs_Ctrl/g" 8.R > 20.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/21_AYE_O_Mero_vs_Ctrl/g" 8.R > 21.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/22_O_Trans_Mero_vs_Ctrl/g" 8.R > 22.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/23_AYE_T_Mero_vs_AYE_T_Ctrl/g" 8.R > 23.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/24_AYE_WT_Azi_solid_vs_Ctrl_solid/g" 8.R > 24.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/25_AYE_T_Azi_solid_vs_Ctrl_solid/g" 8.R > 25.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/26_AYE_O_Azi_solid_vs_Ctrl_solid/g" 8.R > 26.R
      sed "s/08_AYE_WT_ctr_solid_vs_liquid/27_F_Azi_solid_vs_Ctrl_solid/g" 8.R > 27.R

      #For |deg_cutoff_log_foldchange| >=1.2
      sed "s/28_AYE_WT_Rif_vs_Ctrl/29_AYE_T_Rif_vs_Ctrl/g" 28.R > 29.R
      sed "s/28_AYE_WT_Rif_vs_Ctrl/30_AYE_O_Rif_vs_Ctrl/g" 28.R > 30.R
      sed "s/28_AYE_WT_Rif_vs_Ctrl/31_O_Trans_Rif_vs_Ctrl/g" 28.R > 31.R

      (r_env) jhuang@WS-2290C:/mnt/md1/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon/DEG_Results_Complete$ Rscript 1.R
      #=== SUMMARY ===
      #Up-regulated genes: 16
      #  Valid KEGG IDs: 4
      #  Enriched pathways: 0
      #Down-regulated genes: 151
      #  Valid KEGG IDs: 50
      #  Enriched pathways: 4
      #'select()' returned 1:1 mapping between keys and columns
      #'select()' returned 1:1 mapping between keys and columns
      #'select()' returned 1:1 mapping between keys and columns
      #=== SUMMARY ===
      #Up-regulated genes: 16
      #  Valid GO IDs: 16
      #  Enriched GO-terms: 0
      #Down-regulated genes: 151
      #  Valid KEGG IDs: 151
      #  Enriched GO-terms: 3
      #...

10.3. Finalizing the KEGG and GO Enrichment table

      1. NOTE (Already realized in the code): geneIDs in KEGG_Enrichment have been already translated from ko to geneID in H0N29_*-format; If not, nachmachen using eggnog-res, 因为 eggnog里有1-1-mspping Info between ko-Name and GeneID.
      2. NEED_MANUAL_DELETION (Already setting the cutoff in the code): p.adjust values have been calculated, we have to filter all records in GO_Enrichment-results by |p.adjust|<=0.05. DON'T_NEED_ANY_MORE, since pvalueCutoff = 0.05 settings in enricher. Alternative using pvalueCutoff=1.0, marked the color as yellow if the p.adjusted <= 0.05 in GO_enrichment.
      3. NOTE (Not occuring in the new dataset): In rare case, the description is missing for some IDs, e.g. GO term: GO:0006807: replace GO:0006807    obsolete nitrogen compound metabolic process;  ko00975: Metabolism, Biosynthesis of other secondary metabolites
  1. KEGG and GO enrichments via 1.R, 2.R, …

      #Update "01_AYE_T_ctr_vs_AYE_WT_ctr" with "02_AYE_O_ctr_vs_AYE_WT_ctr" in 1.R and save the updated script as 2.R.
      (r_env) jhuang@WS-2290C:/mnt/md1/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon/DEG_Results_Complete$ Rscript 2.R
      === SUMMARY ===
      Up-regulated genes: 9
        Valid KEGG IDs: 1
        Enriched pathways: 0
      Down-regulated genes: 9
        Valid KEGG IDs: 6
        Enriched pathways: 2
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 9
        Valid GO IDs: 9
        Enriched GO-terms: 0
      Down-regulated genes: 9
        Valid KEGG IDs: 9
        Enriched GO-terms: 2
    
      Rscript 3.R
      === SUMMARY ===
      Up-regulated genes: 10
        Valid KEGG IDs: 2
        Enriched pathways: 0
      Down-regulated genes: 44
        Valid KEGG IDs: 20
        Enriched pathways: 6
      === SUMMARY ===
      Up-regulated genes: 10
        Valid GO IDs: 10
        Enriched GO-terms: 0
      Down-regulated genes: 44
        Valid KEGG IDs: 44
        Enriched GO-terms: 0
    
      Rscript 4.R
      === SUMMARY ===
      Up-regulated genes: 15
        Valid KEGG IDs: 7
        Enriched pathways: 6
      Down-regulated genes: 152
        Valid KEGG IDs: 43
        Enriched pathways: 11
      === SUMMARY ===
      Up-regulated genes: 15
        Valid GO IDs: 15
        Enriched GO-terms: 0
      Down-regulated genes: 152
        Valid KEGG IDs: 152
        Enriched GO-terms: 0
    
      Rscript 5.R
      === SUMMARY ===
      Up-regulated genes: 28
        Valid KEGG IDs: 8
        Enriched pathways: 4
      Down-regulated genes: 3
        Valid KEGG IDs: 2
        Enriched pathways: 2
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 28
        Valid GO IDs: 28
        Enriched GO-terms: 0
      Down-regulated genes: 3
        Valid KEGG IDs: 3
        Enriched GO-terms: 1
    
      Rscript 6.R
      === SUMMARY ===
      Up-regulated genes: 28
        Valid KEGG IDs: 8
        Enriched pathways: 4
      Down-regulated genes: 3
        Valid KEGG IDs: 2
        Enriched pathways: 2
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 28
        Valid GO IDs: 28
        Enriched GO-terms: 0
      Down-regulated genes: 3
        Valid KEGG IDs: 3
        Enriched GO-terms: 1
    
      Rscript 7.R
      === SUMMARY ===
      Up-regulated genes: 4
        Valid KEGG IDs: 0
        Enriched pathways: 0
      Down-regulated genes: 14
        Valid KEGG IDs: 10
        Enriched pathways: 1
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 4
        Valid GO IDs: 4
        Enriched GO-terms: 1
      Down-regulated genes: 14
        Valid KEGG IDs: 14
        Enriched GO-terms: 3
    
      Rscript 8.R
      === SUMMARY ===
      Up-regulated genes: 479
        Valid KEGG IDs: 246
        Enriched pathways: 30
      Down-regulated genes: 322
        Valid KEGG IDs: 186
        Enriched pathways: 13
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 479
        Valid GO IDs: 479
        Enriched GO-terms: 4
      Down-regulated genes: 322
        Valid KEGG IDs: 322
        Enriched GO-terms: 6
    
      Rscript 9.R
      === SUMMARY ===
      Up-regulated genes: 472
        Valid KEGG IDs: 230
        Enriched pathways: 25
      Down-regulated genes: 306
        Valid KEGG IDs: 217
        Enriched pathways: 19
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 472
        Valid GO IDs: 472
        Enriched GO-terms: 3
      Down-regulated genes: 306
        Valid KEGG IDs: 306
        Enriched GO-terms: 6
    
      Rscript 10.R
      === SUMMARY ===
      Up-regulated genes: 496
        Valid KEGG IDs: 236
        Enriched pathways: 28
      Down-regulated genes: 330
        Valid KEGG IDs: 250
        Enriched pathways: 26
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 496
        Valid GO IDs: 496
        Enriched GO-terms: 2
      Down-regulated genes: 330
        Valid KEGG IDs: 330
        Enriched GO-terms: 6
    
      Rscript 11.R
      === SUMMARY ===
      Up-regulated genes: 12
        Valid KEGG IDs: 2
        Enriched pathways: 5
      Down-regulated genes: 5
        Valid KEGG IDs: 2
        Enriched pathways: 2
      === SUMMARY ===
      Up-regulated genes: 12
        Valid GO IDs: 12
        Enriched GO-terms: 0
      Down-regulated genes: 5
        Valid KEGG IDs: 5
        Enriched GO-terms: 0
    
      Rscript 12.R
      === SUMMARY ===
      Up-regulated genes: 5
        Valid KEGG IDs: 1
        Enriched pathways: 3
      Down-regulated genes: 8
        Valid KEGG IDs: 3
        Enriched pathways: 2
      === SUMMARY ===
      Up-regulated genes: 5
        Valid GO IDs: 5
        Enriched GO-terms: 0
      Down-regulated genes: 8
        Valid KEGG IDs: 8
        Enriched GO-terms: 0
    
      Rscript 13.R
      === SUMMARY ===
      Up-regulated genes: 12
        Valid KEGG IDs: 8
        Enriched pathways: 4
      Down-regulated genes: 45
        Valid KEGG IDs: 13
        Enriched pathways: 4
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 12
        Valid GO IDs: 12
        Enriched GO-terms: 1
      Down-regulated genes: 45
        Valid KEGG IDs: 45
        Enriched GO-terms: 0
    
      Rscript 14.R
      === SUMMARY ===
      Up-regulated genes: 73
        Valid KEGG IDs: 37
        Enriched pathways: 5
      Down-regulated genes: 180
        Valid KEGG IDs: 29
        Enriched pathways: 3
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 73
        Valid GO IDs: 73
        Enriched GO-terms: 5
      Down-regulated genes: 180
        Valid KEGG IDs: 180
        Enriched GO-terms: 0
    
      Rscript 15.R
      === SUMMARY ===
      Up-regulated genes: 26
        Valid KEGG IDs: 14
        Enriched pathways: 10
      Down-regulated genes: 239
        Valid KEGG IDs: 38
        Enriched pathways: 3
      === SUMMARY ===
      Up-regulated genes: 26
        Valid GO IDs: 26
        Enriched GO-terms: 0
      Down-regulated genes: 239
        Valid KEGG IDs: 239
        Enriched GO-terms: 0
    
      Rscript 16.R
      === SUMMARY ===
      Up-regulated genes: 17
        Valid KEGG IDs: 8
        Enriched pathways: 11
      Down-regulated genes: 82
        Valid KEGG IDs: 14
        Enriched pathways: 3
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 17
        Valid GO IDs: 17
        Enriched GO-terms: 0
      Down-regulated genes: 82
        Valid KEGG IDs: 82
        Enriched GO-terms: 2
    
      Rscript 17.R
      === SUMMARY ===
      Up-regulated genes: 25
        Valid KEGG IDs: 15
        Enriched pathways: 3
      Down-regulated genes: 229
        Valid KEGG IDs: 46
        Enriched pathways: 3
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 25
        Valid GO IDs: 25
        Enriched GO-terms: 2
      Down-regulated genes: 229
        Valid KEGG IDs: 229
        Enriched GO-terms: 1
    
      Rscript 18.R
      === SUMMARY ===
      Up-regulated genes: 66
        Valid KEGG IDs: 18
        Enriched pathways: 8
      Down-regulated genes: 26
        Valid KEGG IDs: 19
        Enriched pathways: 4
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 66
        Valid GO IDs: 66
        Enriched GO-terms: 5
      Down-regulated genes: 26
        Valid KEGG IDs: 26
        Enriched GO-terms: 3
    
      Rscript 19.R
      === SUMMARY ===
      Up-regulated genes: 355
        Valid KEGG IDs: 200
        Enriched pathways: 25
      Down-regulated genes: 175
        Valid KEGG IDs: 70
        Enriched pathways: 3
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 355
        Valid GO IDs: 355
        Enriched GO-terms: 6
      Down-regulated genes: 175
        Valid KEGG IDs: 175
        Enriched GO-terms: 2
    
      Rscript 20.R
      === SUMMARY ===
      Up-regulated genes: 93
        Valid KEGG IDs: 48
        Enriched pathways: 21
      Down-regulated genes: 110
        Valid KEGG IDs: 32
        Enriched pathways: 4
      === SUMMARY ===
      Up-regulated genes: 93
        Valid GO IDs: 93
        Enriched GO-terms: 0
      Down-regulated genes: 110
        Valid KEGG IDs: 110
        Enriched GO-terms: 0
    
      Rscript 21.R
      === SUMMARY ===
      Up-regulated genes: 263
        Valid KEGG IDs: 154
        Enriched pathways: 26
      Down-regulated genes: 244
        Valid KEGG IDs: 72
        Enriched pathways: 3
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 263
        Valid GO IDs: 263
        Enriched GO-terms: 8
      Down-regulated genes: 244
        Valid KEGG IDs: 244
        Enriched GO-terms: 5
    
      Rscript 22.R
      === SUMMARY ===
      Up-regulated genes: 182
        Valid KEGG IDs: 115
        Enriched pathways: 23
      Down-regulated genes: 153
        Valid KEGG IDs: 40
        Enriched pathways: 3
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 182
        Valid GO IDs: 182
        Enriched GO-terms: 3
      Down-regulated genes: 153
        Valid KEGG IDs: 153
        Enriched GO-terms: 2
    
      Rscript 23.R
      === SUMMARY ===
      Up-regulated genes: 91
        Valid KEGG IDs: 37
        Enriched pathways: 17
      Down-regulated genes: 27
        Valid KEGG IDs: 14
        Enriched pathways: 1
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 91
        Valid GO IDs: 91
        Enriched GO-terms: 1
      Down-regulated genes: 27
        Valid KEGG IDs: 27
        Enriched GO-terms: 4
    
      Rscript 24.R
      === SUMMARY ===
      Up-regulated genes: 333
        Valid KEGG IDs: 193
        Enriched pathways: 12
      Down-regulated genes: 328
        Valid KEGG IDs: 175
        Enriched pathways: 24
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 333
        Valid GO IDs: 333
        Enriched GO-terms: 4
      Down-regulated genes: 328
        Valid KEGG IDs: 328
        Enriched GO-terms: 4
    
      Rscript 25.R
      === SUMMARY ===
      Up-regulated genes: 499
        Valid KEGG IDs: 264
        Enriched pathways: 19
      Down-regulated genes: 430
        Valid KEGG IDs: 239
        Enriched pathways: 36
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 499
        Valid GO IDs: 499
        Enriched GO-terms: 2
      Down-regulated genes: 430
        Valid KEGG IDs: 430
        Enriched GO-terms: 0
    
      Rscript 26.R
      === SUMMARY ===
      Up-regulated genes: 474
        Valid KEGG IDs: 267
        Enriched pathways: 19
      Down-regulated genes: 440
        Valid KEGG IDs: 216
        Enriched pathways: 32
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 474
        Valid GO IDs: 474
        Enriched GO-terms: 2
      Down-regulated genes: 440
        Valid KEGG IDs: 440
        Enriched GO-terms: 1
    
      Rscript 27.R
      === SUMMARY ===
      Up-regulated genes: 142
        Valid KEGG IDs: 69
        Enriched pathways: 1
      Down-regulated genes: 269
        Valid KEGG IDs: 148
        Enriched pathways: 27
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 142
        Valid GO IDs: 142
        Enriched GO-terms: 5
      Down-regulated genes: 269
        Valid KEGG IDs: 269
        Enriched GO-terms: 3
    
      Rscript 28.R
      === SUMMARY ===
      Up-regulated genes: 8
        Valid KEGG IDs: 6
        Enriched pathways: 2
      Down-regulated genes: 11
        Valid KEGG IDs: 3
        Enriched pathways: 0
      === SUMMARY ===
      Up-regulated genes: 8
        Valid GO IDs: 8
        Enriched GO-terms: 0
      Down-regulated genes: 11
        Valid KEGG IDs: 11
        Enriched GO-terms: 0
    
      Rscript 29.R
      === SUMMARY ===
      Up-regulated genes: 34
        Valid KEGG IDs: 11
        Enriched pathways: 6
      Down-regulated genes: 41
        Valid KEGG IDs: 22
        Enriched pathways: 1
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 34
        Valid GO IDs: 34
        Enriched GO-terms: 0
      Down-regulated genes: 41
        Valid KEGG IDs: 41
    
      Rscript 30.R
      === SUMMARY ===
      Up-regulated genes: 70
        Valid KEGG IDs: 47
        Enriched pathways: 16
      Down-regulated genes: 99
        Valid KEGG IDs: 40
        Enriched pathways: 3
      === SUMMARY ===
      Up-regulated genes: 70
        Valid GO IDs: 70
        Enriched GO-terms: 0
      Down-regulated genes: 99
        Valid KEGG IDs: 99
        Enriched GO-terms: 0
    
      Rscript 31.R
      === SUMMARY ===
      Up-regulated genes: 6
        Valid KEGG IDs: 3
        Enriched pathways: 0
      Down-regulated genes: 10
        Valid KEGG IDs: 5
        Enriched pathways: 1
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      === SUMMARY ===
      Up-regulated genes: 6
        Valid GO IDs: 6
        Enriched GO-terms: 0
      Down-regulated genes: 10
        Valid KEGG IDs: 10
        Enriched GO-terms: 5
    
      Rscript 32.R
      === SUMMARY ===
      Up-regulated genes: 46
        Valid KEGG IDs: 18
        Enriched pathways: 1
      Down-regulated genes: 6
        Valid KEGG IDs: 0
        Enriched pathways: 0
      No gene sets have size between 10 and 500 ...
      --> return NULL...
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
    
      === SUMMARY ===
      Up-regulated genes: 46
        Valid GO IDs: 46
        Enriched GO-terms: 4
      Down-regulated genes: 6
        Valid KEGG IDs: 6
        Enriched GO-terms: 0
    
      Rscript 33.R
      === SUMMARY ===
      Up-regulated genes: 8
        Valid KEGG IDs: 9
        Enriched pathways: 8
      Down-regulated genes: 72
        Valid KEGG IDs: 7
        Enriched pathways: 3
    
      === SUMMARY ===
      Up-regulated genes: 8
        Valid GO IDs: 8
        Enriched GO-terms: 0
      Down-regulated genes: 72
        Valid KEGG IDs: 72
        Enriched GO-terms: 0
    
      Rscript 34.R
      === SUMMARY ===
      Up-regulated genes: 2
        Valid KEGG IDs: 2
        Enriched pathways: 3
      Down-regulated genes: 17
        Valid KEGG IDs: 3
        Enriched pathways: 0
    
      === SUMMARY ===
      Up-regulated genes: 2
        Valid GO IDs: 2
        Enriched GO-terms: 0
      Down-regulated genes: 17
        Valid KEGG IDs: 17
        Enriched GO-terms: 0
    
      Rscript 35.R
      === SUMMARY ===
      Up-regulated genes: 10
        Valid KEGG IDs: 5
        Enriched pathways: 5
      Down-regulated genes: 27
        Valid KEGG IDs: 3
        Enriched pathways: 0
    
      === SUMMARY ===
      Up-regulated genes: 10
        Valid GO IDs: 10
        Enriched GO-terms: 0
      Down-regulated genes: 27
        Valid KEGG IDs: 27
        Enriched GO-terms: 0
    
      Rscript 36.R
      === SUMMARY ===
      Up-regulated genes: 91
        Valid KEGG IDs: 37
        Enriched pathways: 17
      Down-regulated genes: 27
        Valid KEGG IDs: 14
        Enriched pathways: 1
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
    
      === SUMMARY ===
      Up-regulated genes: 91
        Valid GO IDs: 91
        Enriched GO-terms: 1
      Down-regulated genes: 27
        Valid KEGG IDs: 27
        Enriched GO-terms: 4
    
      Rscript 37.R
      === SUMMARY ===
      Up-regulated genes: 238
        Valid KEGG IDs: 141
        Enriched pathways: 28
      Down-regulated genes: 134
        Valid KEGG IDs: 35
        Enriched pathways: 2
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
    
      === SUMMARY ===
      Up-regulated genes: 238
        Valid GO IDs: 238
        Enriched GO-terms: 2
      Down-regulated genes: 134
        Valid KEGG IDs: 134
        Enriched GO-terms: 4
    
      Rscript 38.R
      === SUMMARY ===
      Up-regulated genes: 177
        Valid KEGG IDs: 104
        Enriched pathways: 22
      Down-regulated genes: 50
        Valid KEGG IDs: 13
        Enriched pathways: 4
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
      'select()' returned 1:1 mapping between keys and columns
    
      === SUMMARY ===
      Up-regulated genes: 177
        Valid GO IDs: 177
        Enriched GO-terms: 2
      Down-regulated genes: 50
        Valid KEGG IDs: 50
        Enriched GO-terms: 4
  2. (DEPRECATED due to be WRONG and TIME-CONSUMING) KEGG and GO enrichments by replacing the sample name in the R-code, and by copying and pasting the R-code directly on console

      # For your reference, here is the exact list of the 31 comparisons and their assigned groupings:
    
      # Baseline / Strain Controls (|log2FC| >= 1.4)
      "01_AYE_T_ctr_vs_AYE_WT_ctr"       = c("group", "AYE-T_ctrl_0_liquid",   "AYE-WT_ctrl_0_liquid"),
      "02_AYE_O_ctr_vs_AYE_WT_ctr"       = c("group", "AYE-O_ctrl_0_liquid",   "AYE-WT_ctrl_0_liquid"),
      "03_O_Trans_ctr_vs_AYE_WT_ctr"     = c("group", "O-Trans_ctrl_0_liquid", "AYE-WT_ctrl_0_liquid"),
      "04_WT_Trans_ctr_vs_AYE_WT_ctr"    = c("group", "WT-Trans_ctrl_0_liquid","AYE-WT_ctrl_0_liquid"),
      "05_AYE_O_ctr_vs_AYE_T_ctr"        = c("group", "AYE-O_ctrl_0_liquid",   "AYE-T_ctrl_0_liquid"),
      "06_O_Trans_ctr_vs_AYE_T_ctr"      = c("group", "O-Trans_ctrl_0_liquid", "AYE-T_ctrl_0_liquid"),
      "07_WT_Trans_ctr_vs_AYE_T_ctr"     = c("group", "WT-Trans_ctrl_0_liquid","AYE-T_ctrl_0_liquid"),
    
      # Condition Effects (|log2FC| >= 2.0)
      "08_AYE_WT_ctr_solid_vs_liquid"    = c("group", "AYE-WT_ctrl_0_solid",   "AYE-WT_ctrl_0_liquid"),
      "09_AYE_O_ctr_solid_vs_liquid"     = c("group", "AYE-O_ctrl_0_solid",    "AYE-O_ctrl_0_liquid"),
      "10_AYE_T_ctr_solid_vs_liquid"     = c("group", "AYE-T_ctrl_0_solid",    "AYE-T_ctrl_0_liquid"),
      "11_AYE_O_ctr_solid_vs_AYE_WT_solid"=c("group", "AYE-O_ctrl_0_solid",    "AYE-WT_ctrl_0_solid"),
      "12_AYE_T_ctr_solid_vs_AYE_WT_solid"=c("group", "AYE-T_ctrl_0_solid",    "AYE-WT_ctrl_0_solid"),
    
      # Diclofenac (|log2FC| >= 2.0)
      "13_AYE_WT_Diclo750_vs_Ctrl"       = c("group", "AYE-WT_Diclo_750_liquid",   "AYE-WT_ctrl_0_liquid"),
      "14_AYE_T_Diclo375_vs_Ctrl"        = c("group", "AYE-T_Diclo_375_liquid",    "AYE-WT_ctrl_0_liquid"),
      "15_AYE_O_Diclo375_vs_Ctrl"        = c("group", "AYE-O_Diclo_375_liquid",    "AYE-WT_ctrl_0_liquid"),
      "16_O_Trans_Diclo375_vs_Ctrl"      = c("group", "O-Trans_Diclo_375_liquid",  "AYE-WT_ctrl_0_liquid"),
      "17_WT_Trans_Diclo750_vs_Ctrl"     = c("group", "WT-Trans_Diclo_750_liquid", "AYE-WT_ctrl_0_liquid"),
      "18_AYE_WT_Diclo1250_solid_vs_Ctrl_solid" = c("group", "AYE-WT_Diclo_1250_solid", "AYE-WT_ctrl_0_solid"),
    
      # Meropenem (|log2FC| >= 2.0)
      "19_AYE_WT_Mero_vs_Ctrl"           = c("group", "AYE-WT_Mero_0.35-0.5_liquid", "AYE-WT_ctrl_0_liquid"),
      "20_AYE_T_Mero_vs_Ctrl"            = c("group", "AYE-T_Mero_0.15_liquid",      "AYE-WT_ctrl_0_liquid"),
      "21_AYE_O_Mero_vs_Ctrl"            = c("group", "AYE-O_Mero_0.5_liquid",       "AYE-WT_ctrl_0_liquid"),
      "22_O_Trans_Mero_vs_Ctrl"          = c("group", "O-Trans_Mero_0.25_liquid",    "AYE-WT_ctrl_0_liquid"),
      "23_AYE_T_Mero_vs_AYE_T_Ctrl"      = c("group", "AYE-T_Mero_0.15_liquid",      "AYE-T_ctrl_0_liquid"),
    
      # Azithromycin (Solid) (|log2FC| >= 2.0)
      "24_AYE_WT_Azi_solid_vs_Ctrl_solid"= c("group", "AYE-WT_Azi_20_solid", "AYE-WT_ctrl_0_solid"),
      "25_AYE_T_Azi_solid_vs_Ctrl_solid" = c("group", "AYE-T_Azi_20_solid",  "AYE-T_ctrl_0_solid"),
      "26_AYE_O_Azi_solid_vs_Ctrl_solid" = c("group", "AYE-O_Azi_20_solid",  "AYE-O_ctrl_0_solid"),
      "27_F_Azi_solid_vs_Ctrl_solid"     = c("group", "F_Azi_20_solid",      "F_ctrl_0_solid"),
    
      # Rifampicin (|log2FC| >= 1.2)
      "28_AYE_WT_Rif_vs_Ctrl"            = c("group", "AYE-WT_Rifampicin_1.5_liquid", "AYE-WT_ctrl_0_liquid"),
      "29_AYE_T_Rif_vs_Ctrl"             = c("group", "AYE-T_Rifampicin_2_liquid",    "AYE-T_ctrl_0_liquid"),
      "30_AYE_O_Rif_vs_Ctrl"             = c("group", "AYE-O_Rifampicin_2_liquid",    "AYE-O_ctrl_0_liquid"),
      "31_O_Trans_Rif_vs_Ctrl"           = c("group", "O-Trans_Rifampicin_2_liquid",  "O-Trans_ctrl_0_liquid")
    
      ./DEG_01_AYE_T_ctr_vs_AYE_WT_ctr.csv
      ./DEG_02_AYE_O_ctr_vs_AYE_WT_ctr.csv
      ./DEG_03_O_Trans_ctr_vs_AYE_WT_ctr.csv
      ./DEG_04_WT_Trans_ctr_vs_AYE_WT_ctr.csv
      ./DEG_05_AYE_O_ctr_vs_AYE_T_ctr.csv
      ./DEG_06_O_Trans_ctr_vs_AYE_T_ctr.csv
      ./DEG_07_WT_Trans_ctr_vs_AYE_T_ctr.csv
      ./DEG_08_AYE_WT_ctr_solid_vs_liquid.csv
      ./DEG_09_AYE_O_ctr_solid_vs_liquid.csv
      ./DEG_10_AYE_T_ctr_solid_vs_liquid.csv
      ./DEG_11_AYE_O_ctr_solid_vs_AYE_WT_solid.csv
      ./DEG_12_AYE_T_ctr_solid_vs_AYE_WT_solid.csv
      ./DEG_13_AYE_WT_Diclo750_vs_Ctrl.csv
      ./DEG_14_AYE_T_Diclo375_vs_Ctrl.csv
      ./DEG_15_AYE_O_Diclo375_vs_Ctrl.csv
      ./DEG_16_O_Trans_Diclo375_vs_Ctrl.csv
      ./DEG_17_WT_Trans_Diclo750_vs_Ctrl.csv
      ./DEG_18_AYE_WT_Diclo1250_solid_vs_Ctrl_solid.csv
      ./DEG_19_AYE_WT_Mero_vs_Ctrl.csv
      ./DEG_20_AYE_T_Mero_vs_Ctrl.csv
      ./DEG_21_AYE_O_Mero_vs_Ctrl.csv
      ./DEG_22_O_Trans_Mero_vs_Ctrl.csv
      ./DEG_23_AYE_T_Mero_vs_AYE_T_Ctrl.csv
      ./DEG_24_AYE_WT_Azi_solid_vs_Ctrl_solid.csv
      ./DEG_25_AYE_T_Azi_solid_vs_Ctrl_solid.csv
      ./DEG_26_AYE_O_Azi_solid_vs_Ctrl_solid.csv
      ./DEG_27_F_Azi_solid_vs_Ctrl_solid.csv
      ./DEG_28_AYE_WT_Rif_vs_Ctrl.csv
      ./DEG_29_AYE_T_Rif_vs_Ctrl.csv
      ./DEG_30_AYE_O_Rif_vs_Ctrl.csv
      ./DEG_31_O_Trans_Rif_vs_Ctrl.csv
    
        #BiocManager::install("GO.db")
        #BiocManager::install("AnnotationDbi")
    
        # Load required libraries
        library(openxlsx)  # For Excel file handling
        library(dplyr)     # For data manipulation
        library(tidyr)
        library(stringr)
        library(clusterProfiler)  # For KEGG and GO enrichment analysis
        #library(org.Hs.eg.db)  # Replace with appropriate organism database
        library(GO.db)
        library(AnnotationDbi)
    
        setwd("~/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/results/star_salmon/DEG_Results_Complete_manually")
    
        # 1. Blast2GO: Extract GO & EC terms (Primary source)
        annot_df <- read.table("/home/jhuang/b2gWorkspace_Tam_RNAseq_AYE/blast2go_annot.annot2_",
                            header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE)
        colnames(annot_df) <- c("GeneID", "Term")
    
        go_terms <- annot_df %>%
        filter(grepl("^GO:", Term)) %>%
        group_by(GeneID) %>%
        summarize(GOs = paste(Term, collapse = ","), .groups = "drop")
    
        ec_terms <- annot_df %>%
        filter(grepl("^EC:", Term)) %>%
        group_by(GeneID) %>%
        summarize(EC = paste(Term, collapse = ","), .groups = "drop")
    
        # Load the results
        res <- read.csv("DEG_01_AYE_T_ctr_vs_AYE_WT_ctr.csv")
    
        # Replace empty GeneName with modified GeneID
        res$GeneName <- ifelse(
            res$GeneName == "" | is.na(res$GeneName),
            gsub("gene-", "", res$GeneID),
            res$GeneName
        )
    
        # Remove duplicated genes by selecting the gene with the smallest padj
        duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]
    
        res <- res %>%
        group_by(GeneName) %>%
        slice_min(padj, with_ties = FALSE) %>%
        ungroup()
    
        res <- as.data.frame(res)
        # Sort res first by padj (ascending) and then by log2FoldChange (descending)
        res <- res[order(res$padj, -res$log2FoldChange), ]
        # Read eggnog annotations
        eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2026_Dicl_Mero_Azith_Rifa_on_AYE/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
        # Remove the "gene-" prefix from GeneID in res to match eggnog 'query' format
        res$GeneID <- gsub("gene-", "", res$GeneID)
        # Merge eggnog data with res based on GeneID
        res <- res %>%
        left_join(eggnog_data, by = c("GeneID" = "query"))
    
        # Merge with the res dataframe
        # Perform the left joins and rename columns
        res_updated <- res %>%
        left_join(go_terms, by = "GeneID") %>%
        left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)
    
        # Filter up- and down-regulated genes (NEEDS_TO_BE_ADAPTED_BY_1.4_or_2.0_or_1.2)
        up_regulated <- res_updated[res_updated$log2FoldChange >= 1.4 & res_updated$padj <= 0.05, ]
        down_regulated <- res_updated[res_updated$log2FoldChange <= -1.4 & res_updated$padj <= 0.05, ]
    
        # Create a new workbook
        wb <- createWorkbook()
        addWorksheet(wb, "Complete_Data")
        writeData(wb, "Complete_Data", res_updated)
        addWorksheet(wb, "Up_Regulated")
        writeData(wb, "Up_Regulated", up_regulated)
        addWorksheet(wb, "Down_Regulated")
        writeData(wb, "Down_Regulated", down_regulated)
        saveWorkbook(wb, "Gene_Expression_with_Annotations_01_AYE_T_ctr_vs_AYE_WT_ctr.xlsx", overwrite = TRUE)
    
        # Set GeneName as row names after the join
        rownames(res_updated) <- res_updated$GeneName
        res_updated <- res_updated %>% dplyr::select(-GeneName)
    
        # ---------------------------------------------------------
        # ---- Perform KEGG enrichment analysis (up_regulated) ----
        gene_list_kegg_up <- up_regulated$KEGG_ko
        gene_list_kegg_up <- gsub("ko:", "", gene_list_kegg_up)
        kegg_enrichment_up <- enrichKEGG(gene = gene_list_kegg_up, organism = 'ko')
    
        # -- convert the GeneID (Kxxxxxx) to the true GeneID --
        # Step 0: Create KEGG to GeneID mapping
        kegg_to_geneid_up <- up_regulated %>%
        dplyr::select(KEGG_ko, GeneID) %>%
        filter(!is.na(KEGG_ko)) %>%  # Remove missing KEGG KO entries
        mutate(KEGG_ko = str_remove(KEGG_ko, "ko:"))  # Remove 'ko:' prefix if present
    
        # Step 1: Clean KEGG_ko values (separate multiple KEGG IDs)
        kegg_to_geneid_clean <- kegg_to_geneid_up %>%
        mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>%  # Remove 'ko:' prefixes
        separate_rows(KEGG_ko, sep = ",") %>%  # Ensure each KEGG ID is on its own row
        filter(KEGG_ko != "-") %>%  # Remove invalid KEGG IDs ("-")
        distinct()  # Remove any duplicate mappings
    
        # Step 2.1: Expand geneID column in kegg_enrichment_up
        expanded_kegg <- kegg_enrichment_up %>%
        as.data.frame() %>%
        separate_rows(geneID, sep = "/") %>%  # Split multiple KEGG IDs (Kxxxxx)
        left_join(kegg_to_geneid_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>%  # Explicitly handle many-to-many
        distinct() %>%  # Remove duplicate matches
        group_by(ID) %>%
        summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop")  # Re-collapse results
        #dplyr::glimpse(expanded_kegg)
    
        # Step 3.1: Replace geneID column in the original dataframe
        kegg_enrichment_up_df <- as.data.frame(kegg_enrichment_up)
        # Remove old geneID column and merge new one
        kegg_enrichment_up_df <- kegg_enrichment_up_df %>%
        dplyr::select(-geneID) %>%  # Remove old geneID column
        left_join(expanded_kegg %>% dplyr::select(ID, GeneID), by = "ID") %>%  # Merge new GeneID column
        dplyr::rename(geneID = GeneID)  # Rename column back to geneID
    
        # -----------------------------------------------------------
        # ---- Perform KEGG enrichment analysis (down_regulated) ----
        # Step 1: Extract KEGG KO terms from down-regulated genes
        gene_list_kegg_down <- down_regulated$KEGG_ko
        gene_list_kegg_down <- gsub("ko:", "", gene_list_kegg_down)
        # Step 2: Perform KEGG enrichment analysis
        kegg_enrichment_down <- enrichKEGG(gene = gene_list_kegg_down, organism = 'ko')
        # --- Convert KEGG gene IDs (Kxxxxxx) to actual GeneIDs ---
        # Step 3: Create KEGG to GeneID mapping from down_regulated dataset
        kegg_to_geneid_down <- down_regulated %>%
        dplyr::select(KEGG_ko, GeneID) %>%
        filter(!is.na(KEGG_ko)) %>%  # Remove missing KEGG KO entries
        mutate(KEGG_ko = str_remove(KEGG_ko, "ko:"))  # Remove 'ko:' prefix if present
        # Step 4: Clean KEGG_ko values (handle multiple KEGG IDs)
        kegg_to_geneid_down_clean <- kegg_to_geneid_down %>%
        mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>%  # Remove 'ko:' prefixes
        separate_rows(KEGG_ko, sep = ",") %>%  # Ensure each KEGG ID is on its own row
        filter(KEGG_ko != "-") %>%  # Remove invalid KEGG IDs ("-")
        distinct()  # Remove duplicate mappings
        # Step 5: Expand geneID column in kegg_enrichment_down
        expanded_kegg_down <- kegg_enrichment_down %>%
        as.data.frame() %>%
        separate_rows(geneID, sep = "/") %>%  # Split multiple KEGG IDs (Kxxxxx)
        left_join(kegg_to_geneid_down_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>%  # Handle many-to-many mappings
        distinct() %>%  # Remove duplicate matches
        group_by(ID) %>%
        summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop")  # Re-collapse results
        # Step 6: Replace geneID column in the original kegg_enrichment_down dataframe
        kegg_enrichment_down_df <- as.data.frame(kegg_enrichment_down) %>%
        dplyr::select(-geneID) %>%  # Remove old geneID column
        left_join(expanded_kegg_down %>% dplyr::select(ID, GeneID), by = "ID") %>%  # Merge new GeneID column
        dplyr::rename(geneID = GeneID)  # Rename column back to geneID
        # View the updated dataframe
        head(kegg_enrichment_down_df)
    
        # Create a new workbook
        wb <- createWorkbook()
        # Save enrichment results to the workbook
        addWorksheet(wb, "KEGG_Enrichment_Up")
        writeData(wb, "KEGG_Enrichment_Up", as.data.frame(kegg_enrichment_up_df))
        # Save enrichment results to the workbook
        addWorksheet(wb, "KEGG_Enrichment_Down")
        writeData(wb, "KEGG_Enrichment_Down", as.data.frame(kegg_enrichment_down_df))
        #saveWorkbook(wb, "KEGG_Enrichment.xlsx", overwrite = TRUE)
    
        # ----------------------------------------
        # ---- Perform GO enrichment analysis ----
    
        # Define gene list (up-regulated genes)
        gene_list_go_up <- up_regulated$GeneID  # Extract the 149 up-regulated genes
        gene_list_go_down <- down_regulated$GeneID  # Extract the 65 down-regulated genes
    
        # Define background gene set (all genes in res)
        background_genes <- res_updated$GeneID  # Extract the 3646 background genes
    
        # Prepare GO annotation data from res
        go_annotation <- res_updated[, c("GOs","GeneID")]  # Extract relevant columns
        go_annotation <- go_annotation %>%
        tidyr::separate_rows(GOs, sep = ",")  # Split multiple GO terms into separate rows
    
        # Perform GO enrichment analysis, where pAdjustMethod is one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
        go_enrichment_up <- enricher(
            gene = gene_list_go_up,                # Up-regulated genes
            TERM2GENE = go_annotation,       # Custom GO annotation
            pvalueCutoff = 0.05,             # Significance threshold
            pAdjustMethod = "BH",
            universe = background_genes      # Define the background gene set
        )
        go_enrichment_up <- as.data.frame(go_enrichment_up)
    
        go_enrichment_down <- enricher(
            gene = gene_list_go_down,                # Up-regulated genes
            TERM2GENE = go_annotation,       # Custom GO annotation
            pvalueCutoff = 0.05,             # Significance threshold
            pAdjustMethod = "BH",
            universe = background_genes      # Define the background gene set
        )
        go_enrichment_down <- as.data.frame(go_enrichment_down)
    
        ## Remove the 'p.adjust' column since no adjusted methods have been applied!
        #go_enrichment_up <- go_enrichment_up[, !names(go_enrichment_up) %in% "p.adjust"]
    
        # Update the Description column with the term descriptions
        go_enrichment_up$Description <- sapply(go_enrichment_up$ID, function(go_id) {
        # Using select to get the term description
        term <- tryCatch({
            AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
        }, error = function(e) {
            message(paste("Error for GO term:", go_id))  # Print which GO ID caused the error
            return(data.frame(TERM = NA))  # In case of error, return NA
        })
    
        if (nrow(term) > 0) {
            return(term$TERM)
        } else {
            return(NA)  # If no description found, return NA
        }
        })
        ## Print the updated data frame
        #print(go_enrichment_up)
    
        ## Remove the 'p.adjust' column since no adjusted methods have been applied!
        #go_enrichment_down <- go_enrichment_down[, !names(go_enrichment_down) %in% "p.adjust"]
    
        # Update the Description column with the term descriptions
        go_enrichment_down$Description <- sapply(go_enrichment_down$ID, function(go_id) {
        # Using select to get the term description
        term <- tryCatch({
            AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
        }, error = function(e) {
            message(paste("Error for GO term:", go_id))  # Print which GO ID caused the error
            return(data.frame(TERM = NA))  # In case of error, return NA
        })
    
        if (nrow(term) > 0) {
            return(term$TERM)
        } else {
            return(NA)  # If no description found, return NA
        }
        })
    
        addWorksheet(wb, "GO_Enrichment_Up")
        writeData(wb, "GO_Enrichment_Up", as.data.frame(go_enrichment_up))
    
        addWorksheet(wb, "GO_Enrichment_Down")
        writeData(wb, "GO_Enrichment_Down", as.data.frame(go_enrichment_down))
    
        # Save the workbook with enrichment results
        saveWorkbook(wb, "KEGG_and_GO_Enrichments_01_AYE_T_ctr_vs_AYE_WT_ctr.xlsx", overwrite = TRUE)
  3. BUG_1: The Count column does not match the number of gene IDs listed in the geneID column.

        The discrepancy between the Count and the number of listed GeneIDs is not an error, but a result of KEGG's many-to-many mapping between physical genes and KOs.
    
        KEGG enrichment statistics (Count, GeneRatio, p-values) are strictly calculated based on unique KOs, not GeneIDs. The geneID column simply maps these enriched KOs back to our specific genome for display.
        Concrete Examples (see KEGG_and_GO_Enrichments_01_AYE_T_ctr_vs_AYE_WT_ctr.xlsx):
    
            * ko03070 (Bacterial secretion system): Count = 2 (KOs: K02456/K02457). However, both KOs map to the exact same physical gene (ABAYE2071), so only 1 geneID is listed.
            * ko05111 (Biofilm formation): Count = 3 (KOs: K02456/K02457/K03092), which map back to 2 unique genes (ABAYE2071/ABAYE3136).
    
        To ensure full transparency, I have added a new KEGG_ko column (placed between Count and geneID). It explicitly lists the exact KOs contributing to the Count.