Daily Archives: 2026年6月10日

From bioBakery Processing to Paired Differential Analysis (Data_Tam_Metagenomics_2026_Wastewater)

Direct Answer

Yes and No.

  • Yes, you can use biobakery_workflows wmgx to process all your FASTQ files into taxonomic and functional profiles, and wmgx_vis to generate basic visualizations (like alpha/beta diversity).
  • No (Not Recommended), you should not use wmgx_vis for the actual differential analysis. Because your study is a paired/longitudinal design (Pre- vs. Post-treatment from the same subjects across multiple time points), you must account for the paired nature of the data (using random effects). The command-line wrapper for wmgx_vis does not support specifying random effects, which would lead to statistically flawed results.

Instead, the standard practice in the bioBakery ecosystem is to use wmgx for processing, and then run MaAsLin 2 directly for the differential analysis.

Here is the complete workflow on how to do this:


Step 1: Process the data with wmgx

Since your paired-end files follow the standard _1.fastq.gz and _2.fastq.gz naming convention, wmgx will automatically detect and pair them.

Assuming all your symlinks are in a folder called /data/fastq_files/:

biobakery_workflows wmgx \
  --input /data/fastq_files/ \
  --output /data/wmgx_results/ \
  --threads 20

(Note: If you don’t need strain profiling, you can speed this up by adding --bypass-strain-profiling)


Step 2: Basic Visualization with wmgx_vis (Optional)

You can use wmgx_vis to generate standard diversity plots and heatmaps. You will need a metadata file (see Step 3 for the format).

biobakery_workflows wmgx_vis \
  --input /data/wmgx_results/ \
  --project-name Metagenome_Study \
  --input-metadata /data/metadata.tsv \
  --metadata-categorical Group Timepoint \
  --output /data/visualization_results/

Step 3: Proper Differential Analysis (Using MaAsLin 2 directly)

To correctly compare Pre vs. Post while accounting for the fact that the samples are paired (and potentially accounting for the different time points), you should use MaAsLin 2 directly on the output tables generated by wmgx.

1. Prepare your Metadata File (metadata.tsv)

You need a tab-separated file where the SampleID exactly matches the prefix of your FASTQ files (e.g., 2025_Nov_Pre_A1).

SampleID Group Timepoint Subject_ID
2025_Nov_Pre_A1 Pre Nov Subject_1
2025_Nov_Post_B1 Post Nov Subject_1
2026_Jan_Pre_A Pre Jan Subject_2
2026_Jan_Post_B Post Jan Subject_2

(Note: You must correctly map which “Pre” sample belongs to which “Post” sample in the Subject_ID column. This is how the model knows they are paired).

2. Run MaAsLin 2 in R

You can run MaAsLin 2 (which is installed with bioBakery) using R. This allows you to define Group (Pre/Post) as a fixed effect and Subject_ID as a random effect, which is mathematically required for paired data.

# Install Maaslin2 if you haven't already
# BiocManager::install("Maaslin2")
library(Maaslin2)

# Run the differential abundance analysis
fit_data <- Maaslin2(
    input_data     = "/data/wmgx_results/wmgx/humann/genefamilies.tsv", # or pathabundance.tsv
    input_metadata = "/data/metadata.tsv",
    output         = "/data/maaslin2_results/",

    # Define your variables
    fixed_effects  = c("Group", "Timepoint"), # Comparing Pre vs Post, while controlling for Timepoint
    random_effects = c("Subject_ID"),         # CRUCIAL: Accounts for the paired Pre/Post design!

    # Data processing parameters
    normalization  = "NONE",                  # HUMAnN output is already normalized to relative abundance
    transform      = "LOG",
    min_abundance  = 0.01,
    min_prevalence = 0.10,
    correction     = "BH"                     # Benjamini-Hochberg for multiple testing (FDR)
)

Summary

  1. Use wmgx to process your raw FASTQs.
  2. Use wmgx_vis if you just want quick, automated PCA/PCoA plots and heatmaps.
  3. Do not use wmgx_vis for the differential statistics. Extract the genefamilies.tsv or metaphlan merged tables and run MaAsLin 2 directly so you can include Subject_ID as a random effect to properly handle your Pre/Post paired design.