-
Run nextflow bacass
conda deactivate # Downlod k2_standard_08_GB_20251015.tar.gz from https://benlangmead.github.io/aws-indexes/k2#kraken2--bracken # Download 20190108_kmerfinder_stable_dirs.tar.gz from https://zenodo.org/records/13447056; 'tar xzf 20190108_kmerfinder_stable_dirs.tar.gz' #The database does not work! # Download the kmerfinder database: https://www.genomicepidemiology.org/services/ --> https://cge.food.dtu.dk/services/KmerFinder/ --> https://cge.food.dtu.dk/services/KmerFinder/etc/kmerfinder_db.tar.gz #The database works! # DEBUG: --kmerfinderdb /mnt/nvme1n1p1/REFs/kmerfinder/bacteria/ not working! nextflow run nf-core/bacass -r 2.6.0 -profile docker --help # -- Hybrid assembly -- nextflow run nf-core/bacass -r 2.6.0 -profile docker \ --input samplesheet_bacass.tsv \ --outdir bacass_out \ --assembly_type hybrid \ --assembler unicycler,dragonflye \ --kraken2db /mnt/nvme1n1p1/REFs/k2_standard_08_GB_20251015.tar.gz \ --skip_kmerfinder \ -resume \ -work-dir bacass_out/work # -- Short assembly -- #Maybe BUG is from '--skip_kmerfinder for -r 2.6.0, using db in 2.5.0' nextflow run nf-core/bacass -r 2.5.0 -profile docker \ --input samplesheet.tsv \ --outdir bacass_out \ --assembly_type short \ --kraken2db /mnt/nvme1n1p1/REFs/k2_standard_08_GB_20251015.tar.gz \ --kmerfinderdb /mnt/nvme1n1p1/REFs/kmerfinder/bacteria/ \ -resume \ -work-dir bacass_out/work # Using prokka assembly since medaka was not generated! jhuang@WS-2290C:~/DATA/Data_Tam_DNAseq_2026_An6_An7_An22_Acinetobacter_sp/bacass_out/Prokka/An7.fna jhuang@WS-2290C:~/DATA/Data_Tam_DNAseq_2026_An6_An7_An22_Acinetobacter_sp/bacass_out/Prokka/An22.fna -
Species Identification: 快速筛查用 Mash → 精确分类用 GTDB-Tk → 种级验证用 FastANI,三者结合可最大限度提高物种鉴定的准确性和可解释性。
# 1. 创建环境(推荐 mamba) mamba create -n gtdbtk -c conda-forge -c bioconda gtdbtk mamba activate gtdbtk # 2. 下载数据库(仅需首次,约 60GB) gtdbtk download --data_dir ./gtdb_data --release 220 wget https://data.gtdb.aau.ecogenomic.org/releases/release232/232.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r232_data.tar.g mamba env config vars set GTDBTK_DATA_PATH="/mnt/nvme4n1p1/gtdb_data/release232" # 先退出当前环境,再重新激活 mamba deactivate mamba activate gtdbtk # 验证环境变量是否加载成功 echo $GTDBTK_DATA_PATH # 应输出:/mnt/nvme4n1p1/gtdb_data/release232 # 3. 运行分类(你提供的命令 + 实用参数) gtdbtk classify_wf \ --genome_dir ./bacass_out/Prokka \ --out_dir gtdb_out \ --cpus 64 \ --extension .fna \ --prefix mygenome # 4. 查看结果 cat gtdb_out/classify/mygenome.bac120.summary.tsv # 细菌结果 #For An7 user_genome classification closest_genome_reference closest_genome_reference_radius closest_genome_taxonomy closest_genome_ani closest_genome_af An7 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter harbinensis GCF_000816495.1 95 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter harbinensis 97.43 0.882 #For An22 user_genome classification closest_genome_reference closest_genome_reference_radius closest_genome_taxonomy closest_genome_ani closest_genome_af An22 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Actinomycetales;f__Micrococcaceae;g__Arthrobacter;s__Arthrobacter sp024124825 GCF_029964055.1 95 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Actinomycetales;f__Micrococcaceae;g__Arthrobacter;s__Arthrobacter sp024124825 99.23 0.929 other_related_references(genome_id,species_name,radius,ANI,AF) GCA_052245515.1, s__Arthrobacter sp052245515, 95.0, 83.71, 0.177; GCF_020532825.1, s__Arthrobacter sp020532825, 95.0, 85.21, 0.268; GCF_937873245.1, s__Arthrobacter sp937873245, 95.0, 85.74, 0.21; GCF_009928425.1, s__Arthrobacter sp009928425, 95.0, 86.58, 0.356; GCA_963698285.1, s__Arthrobacter sp963698285, 95.0, 83.14, 0.159; GCF_020532805.1, s__Arthrobacter sp020532805, 95.0, 83.75, 0.189; GCF_001512305.1, s__Arthrobacter sp001512305, 95.0, 83.7, 0.199; GCF_001750145.1, s__Arthrobacter sp001750145, 95.0, 84.97, 0.268; GCF_019977335.1, s__Arthrobacter sp019977335, 95.0, 84.61, 0.231; GCA_035466775.1, s__Arthrobacter sp035466775, 95.0, 85.13, 0.243; GCA_035467435.1, s__Arthrobacter sp035467435, 95.0, 86.43, 0.25; GCF_001422645.1, s__Arthrobacter sp001422645, 95.0, 84.35, 0.224; GCF_000427315.1, s__Arthrobacter sp000427315, 95.0, 87.23, 0.32; GCA_039636775.1, s__Arthrobacter sp039636775, 95.0, 83.78, 0.217; GCF_029960645.1, s__Arthrobacter sp029960645, 95.0, 84.79, 0.253; GCF_031456935.1, s__Arthrobacter ginsengisoli, 95.0, 84.68, 0.223; GCF_052113365.1, s__Arthrobacter sp052113365, 95.0, 84.71, 0.234; GCA_036390955.1, s__Arthrobacter sp036390955, 95.0, 86.36, 0.167; GCF_030812335.1, s__Arthrobacter oxydans_B, 95.0, 85.33, 0.273; GCF_040547365.1, s__Arthrobacter sp040547365, 95.0, 85.47, 0.29; GCF_007679325.1, s__Arthrobacter sp007679325, 95.0, 84.42, 0.219; GCF_040547025.1, s__Arthrobacter sp040547025, 95.0, 85.41, 0.318; GCF_030433895.1, s__Arthrobacter sp030433895, 95.0, 84.8, 0.252; GCF_050157025.1, s__Arthrobacter sp050157025, 95.0, 82.71, 0.153; GCF_040547005.1, s__Arthrobacter sp040547005, 95.0, 83.22, 0.151; GCA_034376805.1, s__Arthrobacter sp034376805, 95.0, 85.48, 0.304; GCA_028370155.1, s__Arthrobacter sp028370155, 95.0, 84.56, 0.235 -
Antimicrobial resistance gene profiling and Resistome and Virulence Profiling with Abricate and RGI (Reisistance Gene Identifier)
conda activate /home/jhuang/miniconda3/envs/bengal3_ac3 abricate --list conda deactivate ENV_NAME=/home/jhuang/miniconda3/envs/bengal3_ac3 \ ASM=bacass_out/Prokka/An22.fna \ SAMPLE=An22 \ OUTDIR=resistome_virulence_An22 \ MINID=70 MINCOV=50 \ THREADS=32 \ ~/Scripts/run_abricate_resistome_virulome_one_per_gene.sh #ABRicate thresholds: MINID=80 MINCOV=60 Database Hit_lines File MEGARes 0 resistome_virulence_An7/raw/An7.megares.tab CARD 0 resistome_virulence_An7/raw/An7.card.tab ResFinder 0 resistome_virulence_An7/raw/An7.resfinder.tab VFDB 0 resistome_virulence_An7/raw/An7.vfdb.tab #ABRicate thresholds: MINID=70 MINCOV=50 Database Hit_lines File MEGARes 5 resistome_virulence_An7/raw/An7.megares.tab CARD 5 resistome_virulence_An7/raw/An7.card.tab ResFinder 0 resistome_virulence_An7/raw/An7.resfinder.tab VFDB 3 resistome_virulence_An7/raw/An7.vfdb.tab Database Hit_lines File MEGARes 2 resistome_virulence_An22/raw/An22.megares.tab CARD 1 resistome_virulence_An22/raw/An22.card.tab ResFinder 0 resistome_virulence_An22/raw/An22.resfinder.tab VFDB 2 resistome_virulence_An22/raw/An22.vfdb.tab conda activate /home/jhuang/miniconda3/envs/bengal3_ac3 #NEED_TO_ADAPT: OUTDIR = Path("resistome_virulence_An7") #NEED_TO_ADAPT: SAMPLE = "An7" python ~/Scripts/merge_amr_sources_by_gene.py python ~/Scripts/export_resistome_virulence_to_excel_py36.py \ --workdir resistome_virulence_An22 \ --sample An22 \ --out Resistome_Virulence_An22.xlsx # Delete the column 'COVERAGE_MAP' in all 'Raw_*' sheets -
Report
Dear XXXX,
Please find below a summary of genomic analyses for samples An7 and An22.
1. Species Identification
Sample An7: Acinetobacter harbinensis ✅ Confirmed
| Parameter | Value | Interpretation |
|---|---|---|
| Closest Reference | GCF_000816495.1 | Type strain of A. harbinensis |
| ANI | 97.43% | ✅ Well above 95% species threshold |
| AF (Alignment Fraction) | 0.882 | ✅ 88.2% of genome aligns; ANI estimate is robust |
| Final Taxonomy | d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter harbinensis |
Consistent with genomic expectations |
🟢 Conclusion: An7 is confidently assigned to Acinetobacter harbinensis.
Sample An22: Arthrobacter sp. strain An22 🟡 Potential Novel Species
| Parameter | Value | Interpretation |
|---|---|---|
| Closest Reference | GCF_029964055.1 (Arthrobacter sp024124825) | 🟡 Unclassified candidate species |
| ANI | 99.23% | ✅ Highly similar to unclassified reference |
| AF (Alignment Fraction) | 0.929 | ✅ Reliable ANI estimate |
| Final Taxonomy | d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Actinomycetales;f__Micrococcaceae;g__Arthrobacter;s__Arthrobacter sp024124825 |
Clear genus assignment; species-level novelty |
Comparison with Named Arthrobacter Species:
| Reference Species | ANI (%) | AF | Same Species? |
|---|---|---|---|
| A. ginsengisoli (GCF_031456935.1) | 84.68 | 0.223 | ❌ ANI < 95% |
| A. oxydans B (GCF_030812335.1) | 85.33 | 0.273 | ❌ ANI < 95% |
| A. sp000427315 (GCF_000427315.1) | 87.23 | 0.320 | ❌ (highest among named/unclassified) |
| A. sp035467435 (GCA_035467435.1) | 86.43 | 0.250 | ❌ |
| A. sp036390955 (GCA_036390955.1) | 86.36 | 0.167 | ❌ |
| A. sp009928425 (GCF_009928425.1) | 86.58 | 0.356 | ❌ |
| A. sp040547365 (GCF_040547365.1) | 85.47 | 0.290 | ❌ |
| A. sp040547025 (GCF_040547025.1) | 85.41 | 0.318 | ❌ |
| A. sp034376805 (GCA_034376805.1) | 85.48 | 0.304 | ❌ |
| A. sp020532825 (GCF_020532825.1) | 85.21 | 0.268 | ❌ |
| A. sp035466775 (GCA_035466775.1) | 85.13 | 0.243 | ❌ |
| A. sp052113365 (GCF_052113365.1) | 84.71 | 0.234 | ❌ |
| A. sp029960645 (GCF_029960645.1) | 84.79 | 0.253 | ❌ |
| A. sp019977335 (GCF_019977335.1) | 84.61 | 0.231 | ❌ |
| A. sp030433895 (GCF_030433895.1) | 84.80 | 0.252 | ❌ |
| A. sp028370155 (GCA_028370155.1) | 84.56 | 0.235 | ❌ |
| A. sp001750145 (GCF_001750145.1) | 84.97 | 0.268 | ❌ |
| A. sp001422645 (GCF_001422645.1) | 84.35 | 0.224 | ❌ |
| A. sp039636775 (GCA_039636775.1) | 83.78 | 0.217 | ❌ |
| A. sp020532805 (GCF_020532805.1) | 83.75 | 0.189 | ❌ |
| A. sp052245515 (GCA_052245515.1) | 83.71 | 0.177 | ❌ |
| A. sp001512305 (GCF_001512305.1) | 83.70 | 0.199 | ❌ |
| A. sp040547005 (GCF_040547005.1) | 83.22 | 0.151 | ❌ |
| A. sp963698285 (GCA_963698285.1) | 83.14 | 0.159 | ❌ |
| A. sp050157025 (GCF_050157025.1) | 82.71 | 0.153 | ❌ |
| A. sp937873245 (GCF_937873245.1) | 85.74 | 0.210 | ❌ |
🟡 Conclusion: An22 shows >99% ANI to an unclassified Arthrobacter reference genome (GCF_029964055.1) but <86% ANI to all named Arthrobacter species (including A. ginsengisoli and A. oxydans). This supports An22 representing a candidate novel species, tentatively labeled Arthrobacter sp. strain An22.
2. AMR Genes Summary
An7 (A. harbinensis): 6 genes detected (CARD/MEGARes consensus)
adeIJK(RND efflux pump complex) → multidrug resistance (carbapenems, cephalosporins, fluoroquinolones, macrolides, tetracyclines, etc.)abeM(MATE efflux pump) → fluoroquinolones, disinfecting agents & antisepticsLpsB→ intrinsic resistance to colistin and other peptide antibioticsMEXT→ RND efflux regulator (multi-compound & biocide resistance)
An22 (Arthrobacter sp. strain An22): 3 genes detected
rpoBmutants (CARD) → rifamycin resistance (mutations in rifampicin-binding pocket)MTRAD(MEGARes) → multi-drug RND efflux regulatorPARY(MEGARes) → aminocoumarin-resistant DNA topoisomerase (aminocoumarin resistance)
📝 Note: Efflux regulators (MEXT, MTRAD) and intrinsic/target-modification genes are frequently observed in environmental Arthrobacter/Acinetobacter isolates. Phenotypic AST validation is recommended if clinical or biotechnological applications are planned.
3. Virulence Factors (VFDB)
| Sample | Hits | Key Genes | Implication |
|---|---|---|---|
| An7 | 3 | htpB (Hsp60), katA (catalase), pilT (twitching motility) |
Stress survival, oxidative defense, adhesion/biofilm formation |
| An22 | 2 | icl (isocitrate lyase), ideR (iron-dependent regulator) |
Metabolic adaptation (glyoxylate shunt), iron homeostasis & potential persistence |
4. Methylome Data
“Could you please clarify if the datasets include methylome data?”
✅ Yes – Datasets include POD5 files (Oxford Nanopore) containing raw signal data for base modification detection. Methylome analysis is in progress.
5. Attachments
Resistome_Virulence_An7.xlsx– Detailed AMR/virulence tables for A. harbinensis An7Resistome_Virulence_An22.xlsx– Detailed AMR/virulence tables for Arthrobacter sp. strain An22
Each file includes CARD/MEGARes/ResFinder annotations and VFDB virulence factors (%ID, coverage, genomic coordinates, and strand orientation).
Please let me know if you need further breakdowns or phenotypic correlation analysis.
Best, YYYY