Direct Answer
Yes and No.
- Yes, you can use
biobakery_workflows wmgxto process all your FASTQ files into taxonomic and functional profiles, andwmgx_visto generate basic visualizations (like alpha/beta diversity). - No (Not Recommended), you should not use
wmgx_visfor the actual differential analysis. Because your study is a paired/longitudinal design (Pre- vs. Post-treatment from the same subjects across multiple time points), you must account for the paired nature of the data (using random effects). The command-line wrapper forwmgx_visdoes not support specifying random effects, which would lead to statistically flawed results.
Instead, the standard practice in the bioBakery ecosystem is to use wmgx for processing, and then run MaAsLin 2 directly for the differential analysis.
Here is the complete workflow on how to do this:
Step 1: Process the data with wmgx
Since your paired-end files follow the standard _1.fastq.gz and _2.fastq.gz naming convention, wmgx will automatically detect and pair them.
Assuming all your symlinks are in a folder called /data/fastq_files/:
biobakery_workflows wmgx \
--input /data/fastq_files/ \
--output /data/wmgx_results/ \
--threads 20
(Note: If you don’t need strain profiling, you can speed this up by adding --bypass-strain-profiling)
Step 2: Basic Visualization with wmgx_vis (Optional)
You can use wmgx_vis to generate standard diversity plots and heatmaps. You will need a metadata file (see Step 3 for the format).
biobakery_workflows wmgx_vis \
--input /data/wmgx_results/ \
--project-name Metagenome_Study \
--input-metadata /data/metadata.tsv \
--metadata-categorical Group Timepoint \
--output /data/visualization_results/
Step 3: Proper Differential Analysis (Using MaAsLin 2 directly)
To correctly compare Pre vs. Post while accounting for the fact that the samples are paired (and potentially accounting for the different time points), you should use MaAsLin 2 directly on the output tables generated by wmgx.
1. Prepare your Metadata File (metadata.tsv)
You need a tab-separated file where the SampleID exactly matches the prefix of your FASTQ files (e.g., 2025_Nov_Pre_A1).
| SampleID | Group | Timepoint | Subject_ID |
|---|---|---|---|
| 2025_Nov_Pre_A1 | Pre | Nov | Subject_1 |
| 2025_Nov_Post_B1 | Post | Nov | Subject_1 |
| 2026_Jan_Pre_A | Pre | Jan | Subject_2 |
| 2026_Jan_Post_B | Post | Jan | Subject_2 |
| … | … | … | … |
(Note: You must correctly map which “Pre” sample belongs to which “Post” sample in the Subject_ID column. This is how the model knows they are paired).
2. Run MaAsLin 2 in R
You can run MaAsLin 2 (which is installed with bioBakery) using R. This allows you to define Group (Pre/Post) as a fixed effect and Subject_ID as a random effect, which is mathematically required for paired data.
# Install Maaslin2 if you haven't already
# BiocManager::install("Maaslin2")
library(Maaslin2)
# Run the differential abundance analysis
fit_data <- Maaslin2(
input_data = "/data/wmgx_results/wmgx/humann/genefamilies.tsv", # or pathabundance.tsv
input_metadata = "/data/metadata.tsv",
output = "/data/maaslin2_results/",
# Define your variables
fixed_effects = c("Group", "Timepoint"), # Comparing Pre vs Post, while controlling for Timepoint
random_effects = c("Subject_ID"), # CRUCIAL: Accounts for the paired Pre/Post design!
# Data processing parameters
normalization = "NONE", # HUMAnN output is already normalized to relative abundance
transform = "LOG",
min_abundance = 0.01,
min_prevalence = 0.10,
correction = "BH" # Benjamini-Hochberg for multiple testing (FDR)
)
Summary
- Use
wmgxto process your raw FASTQs. - Use
wmgx_visif you just want quick, automated PCA/PCoA plots and heatmaps. - Do not use
wmgx_visfor the differential statistics. Extract thegenefamilies.tsvormetaphlanmerged tables and run MaAsLin 2 directly so you can includeSubject_IDas a random effect to properly handle your Pre/Post paired design.