Daily Archives: 2026年4月9日

Phylogenetic tree construction

core_tree_v2

core_tree

scripts_tree_construction_v2.zip

scripts_tree_construction.zip

conda activate /home/jhuang/miniconda3/envs/bengal3_ac3

1. Download genomes from Genbank to local disk

LOCAL_ONLY_ACCESSIONS=(“CP002773.1” “CP093323.1” “CP012097.1”, “CP014137.1”, “JBUDMV010000001.1”) saving in the directory local_Sply_fastas, GE11174.fasta should be ./

2. Replace EXTRA_ASSEMBLIES (manual edit or use sed/python as shown above)

EXTRA_ASSEMBLIES=(

=== Gibbsiella (core comparison) ===

"GCF_002291425.1"  # G. quercinecans FRB97ᵀ
"GCF_004342245.1"  # G. quercinecans DSM 25889
#"GCF_039539505.1"  # G. papilionis PWX6
#    "GCF_039539505.1": "Gibbsiella papilionis PWX6 (GCF_039539505.1)",
"JBUDMV010000001.1"
"GCA_039540155.1"  # G. greigii USA56
#"GCA_004342265.1"  # G. dentisursi DSM 23818ᵀ

# === Serratia (address biochemical mis-ID) ===
#mkdir -p local_Sply_fastas
#mv ~/Downloads/CP002773.1.fasta local_Sply_fastas/
#mv ~/Downloads/CP093323.1.fasta local_Sply_fastas/
#mv ~/Downloads/CP012097.1.fasta local_Sply_fastas/
#"CP002773.1"       # S. plymuthica AS9
#"CP093323.1"       # S. plymuthica B37/06
#"CP012097.1"       # S. plymuthica 3Re4-18
"GCA_001663135.1"  # S. odorifera 4Rx13 (already validated)
#"GCA_002951025.1"  # S. fonticola CDC 470-73ᵀ

# === Close outgroup (same family Yersiniaceae) ===
#"GCF_005484965.1"  # Brenneria nigrifluens LMG 5956ᵀ
#   "GCF_005484965.1": "Brenneria nigrifluens LMG 5956 (GCF_005484965.1)"
#"GCF_000239205.1"  # Brenneria goodwinii BC-1
"CP014137.1"

)

3. EXACT desired display labels (as requested)

EXACT display names

desired = { “JBVKFS000000000”: “Gibbsiella quercinecans HH131225 (JBVKFS000000000) ⭐”, “GCF_002291425.1”: “Gibbsiella quercinecans FRB97 (GCF_002291425.1)”, “GCF_004342245.1”: “Gibbsiella quercinecans DSM 25889 (GCF_004342245.1)”, “JBUDMV010000001.1”: “Gibbsiella dentisursi 2e (JBUDMV010000001)”, “GCA_039540155.1”: “Gibbsiella greigii USA56 (GCA_039540155.1)”, “CP002773.1”: “Serratia plymuthica AS9 (CP002773.1)”, “CP093323.1”: “Serratia plymuthica B37/06 (CP093323.1)”, “CP012097.1”: “Serratia plymuthica 3Re4-18 (CP012097.1)”, “GCA_001663135.1”: “Serratia odorifera 4Rx13 (GCA_001663135.1)”, “CP014137.1”: “Brenneria goodwinii FRB141 (CP014137.1)” }

4. Re-run the pipeline

./build_wgs_tree_fig3B_v3.sh

5. Verify output labels

cat work_wgs_tree/plot/labels.tsv

6. Delete the unnecessary isolates from labels.tsv, fastas and gffs, for example delete two isolates starting with ‘-‘

sample  display
GCF_002291425.1 Gibbsiella quercinecans FRB97 (GCF_002291425.1)
GCA_039540155.1 Gibbsiella greigii USA56 (GCA_039540155.1)
- GCF_039539505.1 G_papilionis_PWX6 (GCF_039539505.1)
- GCF_005484965.1 B_nigrifluens_LMG5956 (GCF_005484965.1)
JBVKFS000000000 Gibbsiella quercinecans HH131225 (JBVKFS000000000) ⭐
GCF_004342245.1 Gibbsiella quercinecans DSM 25889 (GCF_004342245.1)
JBUDMV010000001.1       Gibbsiella dentisursi 2e (JBUDMV010000001)
CP002773.1      Serratia plymuthica AS9 (CP002773.1)
CP093323.1      Serratia plymuthica B37/06 (CP093323.1)
CP012097.1      Serratia plymuthica 3Re4-18 (CP012097.1)
GCA_001663135.1 Serratia odorifera 4Rx13 (GCA_001663135.1)
CP014137.1      Brenneria goodwinii FRB141 (CP014137.1)

🔧 If you want to skip re-running everything If Prokka/GFFs are already done and you just need to re-run Roary→RAxML→Plot with correct inputs:

A. Ensure collect_fastas copied all 10 FASTAs

ls -1 work_wgs_tree_expanded/fastas/*.fna | wc -l # Should be 10

B. Re-run Roary with relaxed core threshold (for mixed genera)

roary -e –mafft -p 64 -cd 95 -i 95 \ -f work_wgs_tree_expanded/roary_final \ work_wgs_tree_expanded/gffs/*.gff #–> 60736 SNPs

C. Run RAxML-NG with explicit FASTA format

CORE_ALN=”work_wgs_tree_expanded/roary_final/core_gene_alignment.aln” raxml-ng –all \ –msa “$CORE_ALN” \ –msa-format FASTA \ –model GTR+G \ –bs-trees 1000 \ –threads 80 \ –prefix work_wgs_tree_expanded/raxmlng/core \ –redo

D. Run plotting with correct labels

Adapt the plot_tree.R by entering the outgroup-id, e.g. CP014137.1

Version 1: WITHOUT bootstrap values

Rscript plot_tree.R \ work_wgs_tree_expanded/raxmlng/core.raxml.support \ work_wgs_tree_expanded/plot/labels.tsv \ work_wgs_tree_expanded/plot/core_tree_fig3B_no_bootstrap.pdf \ work_wgs_tree_expanded/plot/core_tree_fig3B_no_bootstrap.png \ FALSE

Version 2: WITH bootstrap values

Rscript plot_tree.R \ work_wgs_tree_expanded/raxmlng/core.raxml.support \ work_wgs_tree_expanded/plot/labels.tsv \ work_wgs_tree_expanded/plot/core_tree_fig3B_with_bootstrap.pdf \ work_wgs_tree_expanded/plot/core_tree_fig3B_with_bootstrap.png \ TRUE

REPORT

I agree that showing where Gibbsiella sits among related Gram-negative bacteria would be interesting. However, for the generation of the phylogenetic tree, we are using a core-genome alignment approach (Roary + RAxML-NG), which has an important technical constraint: Core-genome methods require sufficient gene content overlap. When taxa are too evolutionarily distant, the number of shared (“core”) genes drops sharply, resulting in sparse alignments and reduced branch support. With too few core genes, the tree may reflect alignment artifacts rather than true evolutionary history.

I have currently selected the following species for tree construction: TARGET ISOLATE • G. quercinecans HH131225 – Clinical case (highlighted in red) GIBBSIELLA (intra-genus diversity) • G. quercinecans (2 strains) • G. papilionis • G. greigii SERRATIA (address biochemical misidentification) • S. plymuthica (3 strains) • S. odorifera OUTGROUP (rooting) • Brenneria nigrifluens This ensures a robust core alignment (51,495 SNPs) and high-confidence branching for the clinically relevant comparison.

Figure Legend: Figure 1. Core-genome maximum-likelihood phylogeny of Gibbsiella quercinecans HH131225 (red) with reference strains from the genus Gibbsiella, biochemically similar Serratia species, and Brenneria outgroup. Tip labels show species, strain designation, and GenBank assembly accession or chromosome accession. Scale bar: substitutions per site. Tree inferred from 60,736 core-genome SNPs using Roary (Page et al., 2015) and RAxML-NG (Kozlov et al., 2019) with the GTR+G model and 1,000 bootstrap replicates.

References: Page et al., 2015: Page, A. J., Cummins, C. A., Hunt, M., Wong, V. K., Reuter, S., Holden, M. T., … & Parkhill, J. (2015). Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, 31(22), 3691–3693. https://doi.org/10.1093/bioinformatics/btv421 Kozlov et al., 2019: Kozlov, A. M., Darriba, D., Flouri, T., Morel, B., & Stamatakis, A. (2019). RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, 35(21), 4453–4455. https://doi.org/10.1093/bioinformatics/btz305