Monthly Archives: April 2026

Pipeline: Whole-Genome Validation of *Acinetobacter baumannii* ATCC19606 Deletion Mutants (Data_Foong_RNAseq_2021_ATCC19606_Cm)

This workflow describes the assembly, scaffolding, variant calling, and structural variant validation used to confirm the genomic integrity of A. baumannii ATCC19606 WT and the ΔadeAB, ΔadeIJ, and ΔcraA mutants.


1. Raw Read Preparation and Assembly

Create project structure

mkdir bacto_DNAseq
cd bacto_DNAseq
mkdir raw_data
cd raw_data

Link raw FASTQ files

ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_1.fq.gz 19606adeAB_R1.fastq.gz
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_2.fq.gz 19606adeAB_R2.fastq.gz

# Additional files present in the project
A10CraA_R1.fastq.gz
A10CraA_R2.fastq.gz
A6WT_R1.fastq.gz
A6WT_R2.fastq.gz
adeIJ_R1.fastq.gz
adeIJ_R2.fastq.gz

Install and run the bacto_DNAseq pipeline including assembly

git clone https://github.com/huang/bacto_DNAseq
mv bacto_DNAseq/* ./
rm -rf bacto_DNAseq

conda activate /home/jhuang/miniconda3/envs/bengal3_ac3
snakemake --printshellcmds

Notes

  • Edit bacto_DNAseq-0.1.json to enable only:

    • assembly
    • typing_mlst
    • optionally pangenome
    • variants_calling
  • The pipeline requires access to:
/media/jhuang/Titisee/GAMOLA2/TIGRfam_db/TIGRFAMs_15.0_HMM.LIB

Original commands

# ---------------------------- Assembly using bacto ----------------------------

mkdir bacto_DNAseq; cd bacto_DNAseq;
mkdir raw_data; cd raw_data;
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_1.fq.gz     19606adeAB_R1.fastq.gz
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_2.fq.gz     19606adeAB_R2.fastq.gz
./A10CraA_R1.fastq.gz
./A10CraA_R2.fastq.gz
./A6WT_R1.fastq.gz
./A6WT_R2.fastq.gz
./adeIJ_R1.fastq.gz
./adeIJ_R2.fastq.gz

git clone https://github.com/huang/bacto_DNAseq
mv bacto_DNAseq/* ./
rm -rf bacto_DNAseq
conda activate /home/jhuang/miniconda3/envs/bengal3_ac3

(bengal3_ac3) jhuang@WS-2290C:~/DATA/Data_Tam_DNAseq_2023_A6WT_A10CraA_A12AYE_A1917978$ which snakemake
/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snakemake
(bengal3_ac3) jhuang@WS-2290C:~/DATA/Data_Tam_DNAseq_2023_A6WT_A10CraA_A12AYE_A1917978$ snakemake -v
4.0.0 --> CORRECT!

#NOTE_1: modify bacto_DNAseq-0.1.json keeping only steps assembly, typing_mlst, possibly pangenome and variants_calling true!
#NOTE_2: needs disk Titisee since the pipeline needs /media/jhuang/Titisee/GAMOLA2/TIGRfam_db/TIGRFAMs_15.0_HMM.LIB
snakemake --printshellcmds

2. Contig Filtering and Chromosome Scaffolding

After SPAdes assembly, contigs shorter than 500 bp are removed:

seqkit seq -m 500 A6WT/contigs.fa > A6WT_contigs.min500.fasta
seqkit seq -m 500 19606adeAB/contigs.fa > adeAB_contigs.min500.fasta
seqkit seq -m 500 A10CraA/contigs.fa > A10CraA_contigs.min500.fasta
seqkit seq -m 500 adeIJ/contigs.fa > adeIJ_contigs.min500.fasta

Chromosomal contigs are then identified by alignment against the reference genome CP059040.fasta using minimap2. Contigs lacking alignment are interpreted as putative plasmids and excluded from scaffolding.

WT

Excluded plasmid contig:

contig00016

ΔadeAB

Excluded plasmid contigs:

contig00029
contig00030
contig00033
contig00039

ΔcraA

Excluded plasmid contig:

contig00096

ΔadeIJ

Excluded plasmid contigs:

contig00017
contig00019
contig00020
contig00021
contig00025

Scaffold chromosome using RagTag

Example for ΔadeAB:

ragtag.py scaffold CP059040.fasta adeAB_contigs.min500.no29_30_33_39.fasta -o ragtag_adeAB -C

The scaffolded chromosome is concatenated with excluded plasmid contigs to generate the final assembly.

Original commands

# ----------------------------- Scaffolding ------------------------------
cd shovill

seqkit seq -m 500 contigs.fa > contigs.min500.fasta
#seqkit seq -g -m 500 contigs.fa > contigs.min500_g.fasta

#For project 2:
seqkit seq -m 500 adeABadeIJ_contigs.fa > adeABadeIJ_contigs.min500.fasta
seqkit seq -m 500 adeIJK_contigs.fa > adeIJK_contigs.min500.fasta

#For project 1:
seqkit seq -m 500 A6WT/contigs.fa > A6WT_contigs.min500.fasta
seqkit seq -m 500 19606adeAB/contigs.fa > adeAB_contigs.min500.fasta
seqkit seq -m 500 A10CraA/contigs.fa > A10CraA_contigs.min500.fasta
seqkit seq -m 500 adeIJ/contigs.fa > adeIJ_contigs.min500.fasta

# NOT_NEED_ANYMORE: Perform online scaffolding with Multi-CSAR v1.1 (https://genome.cs.nthu.edu.tw/Multi-CSAR/) --> Using new methods minimap2 + RagTag!

#2
adeABadeIJ    29 contigs
adeIJK        22 contigs

#1
A6WT          22 contigs -1
adeAB         40 contigs -4

adeIJ         27 contigs -5
A10CraA       24 contigs -1

#2
minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeABadeIJ_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00020
#-->contig00021
#-->contig00027
seqkit grep -v -r \
  -p "^contig00020([[:space:]]|$)" \
  -p "^contig00021([[:space:]]|$)" \
  -p "^contig00027([[:space:]]|$)" \
  adeABadeIJ_contigs.min500.fasta > adeABadeIJ_contigs.min500.no20_21_27.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeABadeIJ_contigs.min500.no20_21_27.fasta -o ragtag_adeABadeIJ  -C

minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeIJK_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00016
seqkit grep -v -r -p "^contig00016(\s|$)" adeIJK_contigs.min500.fasta > adeIJK_contigs.min500.no16.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeIJK_contigs.min500.no16.fasta -o ragtag_adeIJK  -C

#1
(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta A6WT_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00016
seqkit grep -v -r -p "^contig00016(\s|$)" A6WT_contigs.min500.fasta > A6WT_contigs.min500.no16.fasta
seqkit grep -r -p "^contig00016(\s|$)" A6WT_contigs.min500.fasta > A6WT_contigs.min500.16.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta A6WT_contigs.min500.no16.fasta -o ragtag_A6WT  -C
cat ragtag_A6WT/ragtag.scaffold.fasta A6WT_contigs.min500.16.fasta > A6WT_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' A6WT_chr_plasmids_.fasta > A6WT_chr_plasmids__.fasta
seqkit seq A6WT_chr_plasmids__.fasta > A6WT_chr_plasmids.fasta

(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeAB_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00029
#-->contig00030
#-->contig00033
#-->contig00039
seqkit grep -v -r \
  -p "^contig00029([[:space:]]|$)" \
  -p "^contig00030([[:space:]]|$)" \
  -p "^contig00033([[:space:]]|$)" \
  -p "^contig00039([[:space:]]|$)" \
  adeAB_contigs.min500.fasta > adeAB_contigs.min500.no29_30_33_39.fasta
seqkit grep -r \
  -p "^contig00029([[:space:]]|$)" \
  -p "^contig00030([[:space:]]|$)" \
  -p "^contig00033([[:space:]]|$)" \
  -p "^contig00039([[:space:]]|$)" \
  adeAB_contigs.min500.fasta > adeAB_contigs.min500.29_30_33_39.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeAB_contigs.min500.no29_30_33_39.fasta -o ragtag_adeAB  -C
cat ragtag_adeAB/ragtag.scaffold.fasta adeAB_contigs.min500.29_30_33_39.fasta > adeAB_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' adeAB_chr_plasmids_.fasta > adeAB_chr_plasmids__.fasta
seqkit seq adeAB_chr_plasmids__.fasta > adeAB_chr_plasmids.fasta
samtools faidx adeAB_chr_plasmids.fasta

(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta A10CraA_clean.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00096
seqkit grep -v -r \
  -p "^contig00096([[:space:]]|$)" \
  A10CraA_clean.fasta > A10CraA_contigs.min500.no96.fasta
seqkit grep -r \
  -p "^contig00096([[:space:]]|$)" \
  A10CraA_clean.fasta > A10CraA_contigs.min500.96.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta A10CraA_contigs.min500.no96.fasta -o ragtag_A10CraA  -C
cat ragtag_A10CraA/ragtag.scaffold.fasta A10CraA_contigs.min500.96.fasta > A10CraA_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' A10CraA_chr_plasmids_.fasta > A10CraA_chr_plasmids__.fasta
seqkit seq A10CraA_chr_plasmids__.fasta > A10CraA_chr_plasmids.fasta

(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeIJ_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#contig00017
#contig00019
#contig00020
#contig00021
#contig00025
seqkit grep -v -r \
  -p "^contig00017([[:space:]]|$)" \
  -p "^contig00019([[:space:]]|$)" \
  -p "^contig00020([[:space:]]|$)" \
  -p "^contig00021([[:space:]]|$)" \
  -p "^contig00025([[:space:]]|$)" \
  adeIJ_contigs.min500.fasta > adeIJ_contigs.min500.no17_19_20_21_25.fasta
seqkit grep -r \
  -p "^contig00017([[:space:]]|$)" \
  -p "^contig00019([[:space:]]|$)" \
  -p "^contig00020([[:space:]]|$)" \
  -p "^contig00021([[:space:]]|$)" \
  -p "^contig00025([[:space:]]|$)" \
  adeIJ_contigs.min500.fasta > adeIJ_contigs.min500.17_19_20_21_25.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeIJ_contigs.min500.no17_19_20_21_25.fasta -o ragtag_adeIJ  -C
cat ragtag_adeIJ/ragtag.scaffold.fasta adeIJ_contigs.min500.17_19_20_21_25.fasta > adeIJ_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' adeIJ_chr_plasmids_.fasta > adeIJ_chr_plasmids__.fasta
seqkit seq adeIJ_chr_plasmids__.fasta > adeIJ_chr_plasmids.fasta
samtools faidx adeIJ_chr_plasmids.fasta

3. Final FASTA Header Format for NCBI Submission

Example headers:

>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00029 [plasmid-name=pAdeAB1] [topology=circular] [completeness=partial]

Important: completeness=incomplete is not accepted by NCBI and must be replaced with:

completeness=partial

Automatic correction:

sed -i 's/completeness=incomplete/completeness=partial/g' *.fasta

Original commands

# IUPUT assembled and scaffolded files
#./shovill/A6WT_chr_plasmids.fasta
#./shovill/A10CraA_chr_plasmids.fasta
#./shovill/adeAB_chr_plasmids.fasta
#./shovill/adeIJ_chr_plasmids.fasta

# 备份原文件
cp A6WT_chr_plasmids.fasta A6WT_chr_plasmids.fasta.backup

# 替换错误的 completeness=incomplete 为 completeness=partial
sed -i 's/completeness=incomplete/completeness=partial/g' A6WT_chr_plasmids.fasta

## 或者移除所有 topology 和 completeness 标签(最安全)
#sed -i 's/ \[topology=[^]]*\]//g' A6WT_chr_plasmids.fasta
#sed -i 's/ \[completeness=[^]]*\]//g' A6WT_chr_plasmids.fasta

(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" A6WT_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00016 [plasmid-name=pWT1] [topology=circular] [completeness=partial]
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" A10CraA_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00096 [plasmid-name=pCraA1] [topology=circular] [completeness=partial]
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" adeAB_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00029 [plasmid-name=pAdeAB1] [topology=circular] [completeness=partial]
>contig00030 [plasmid-name=pAdeAB2] [topology=circular] [completeness=partial]
>contig00033 [plasmid-name=pAdeAB3] [topology=circular] [completeness=partial]
>contig00039 [plasmid-name=pAdeAB4] [topology=circular] [completeness=partial]
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" adeIJ_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00017 [plasmid-name=pAdeIJ1] [topology=circular] [completeness=partial]
>contig00019 [plasmid-name=pAdeIJ2] [topology=circular] [completeness=partial]
>contig00020 [plasmid-name=pAdeIJ3] [topology=circular] [completeness=partial]
>contig00021 [plasmid-name=pAdeIJ4] [topology=circular] [completeness=partial]
>contig00025 [plasmid-name=pAdeIJ5] [topology=circular] [completeness=partial]

4. SNP and Small Indel Detection

Two independent pipelines are used:

  1. Snippy v4.6.0
  2. SPANDx v3.2

Only variants detected by both methods are retained.

Snippy summary

python3 ~/Scripts/summarize_snippy_res.py snippy

SPANDx run

nextflow run spandx/main.nf \
  --fastq "*_P_{1,2}.fastq.gz" \
  --ref CP059040.fasta \
  --annotation \
  --database CP059040 \
  -resume

Merge final variant calls

python3 ~/Scripts/merge_snps_indels.py \
  bacto_DNAseq/snippy/summary_snps_indels.csv \
  spandx/Outputs/Phylogeny_and_annotation/f1_7___ \
  merged_variants.csv

Result: no SNPs or small indels unique to any mutant outside the intended deletion loci.

Original commands

# ---------------------------- SNP+Indel using snippy ----------------------------

#Summarize all SNPs and Indels from the snippy result directory
#NOTE: need to adapt the isolate names in summarize_snippy_res.py
#Output: snippy_CP133676/snippy/summary_snps_indels.csv
#Aapt the sample names in ~/Scripts/summarize_snippy_res.py "A6WT", "A10CraA", "19606adeAB", "adeIJ"
python3 ~/Scripts/summarize_snippy_res.py snippy
#? in the record they are not 100% identical: CP059040,1527276,TTGAACC,del,TTGAACC,T,TTGAACC,TTGAACC,conservative_inframe_deletion c.1327_1332delGAACCT p.Glu443_Pro444del,,,,,,H0N29_07175,nan
#                                             CP059040,3124917,T,snp,T,T,C,T,nan,,,,,,nan,nan --> gene AB!

# ---------------------------- SNP+Indel using spandx ----------------------------

mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040
cp CP059040.gb  ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040/genes.gbk
vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
/home/jhuang/miniconda3/envs/spandx/bin/snpEff build CP059040     -d

mkdir spandx
cp bacto_DNAseq/trimmed/*_P_*fastq spandx
cd spandx
gzip A6WT_trimmed_P_1.fastq A6WT_trimmed_P_2.fastq 19606adeAB_trimmed_P_1.fastq 19606adeAB_trimmed_P_2.fastq ...

cp CP059040.fasta spandx
#grep ">" CP059040.fasta
#>CP059040
conda activate /home/jhuang/miniconda3/envs/spandx
ln -s /home/jhuang/Tools/spandx/ spandx
(spandx) nextflow run spandx/main.nf --fastq "*_P_{1,2}.fastq.gz" --ref CP059040.fasta --annotation --database CP059040 -resume

# ---------------------------- post-processing of Outputs since BUG: all indels and snps are annotated as MODIFIER --> taking only position, the annotation take we from snippy ----------------------
    #-- _CP133676 produced by SPANDx (temporary not necessary) --
    cd Outputs/Phylogeny_and_annotation
    #awk '{if($3!=$7) print}' < All_SNPs_indels_annotated.txt > All_SNPs_indels_annotated_.txt
    cut -d$'\t' -f1-7 All_SNPs_indels_annotated.txt > f1_7
    grep -v "/" f1_7 > f1_7_
    grep -v "\." f1_7_ > f1_7__
    grep -v "*" f1_7__ > f1_7___ #(35 records)

# ----------------------------  merge the following two files summary_snps_indels.csv (192) and All_SNPs_indels_annotated.txt (248) –> merged_variants.csv (94) ----------------------------

#Note that the results for the project is from manually selected the common annotated snp and indels as bacto_DNAseq/snippy/summary_snps_indels.csv

CHROM,POS,REF,TYPE,A6WT,A10CraA,19606adeAB,adeIJ,Effect,Impact,Functional_Class,Codon_change,Protein_and_nucleotide_change,Amino_Acid_Length,Gene_name,Biotype
CP059040,136447,T,snp,A,A,A,A,nan,,,,,,nan,nan
CP059040,152318,T,snp,A,A,A,A,missense_variant c.106A>T p.Thr36Ser,,,,,,H0N29_00700,nan
CP059040,171965,T,snp,A,A,A,A,missense_variant c.167T>A p.Val56Asp,,,,,,H0N29_00780,nan
CP059040,194020,C,ins,CT,CT,CT,CT,frameshift_variant c.3260dupA p.Arg1088fs,,,,,,H0N29_00885,nan
CP059040,375297,A,snp,T,T,T,T,nan,,,,,,nan,nan
CP059040,468931,A,ins,AT,AT,AT,AT,nan,,,,,,nan,nan
CP059040,468976,A,ins,AT,AT,AT,AT,nan,,,,,,nan,nan
CP059040,609979,A,snp,T,T,T,T,stop_gained c.1009A>T p.Lys337*,,,,,,H0N29_02925,nan
CP059040,1036730,T,ins,TA,TA,TA,TA,nan,,,,,,nan,nan
CP059040,1059847,C,snp,T,T,T,T,missense_variant c.119G>A p.Gly40Asp,,,,,,H0N29_04880,nan
CP059040,1272639,T,snp,G,G,G,G,nan,,,,,,nan,nan
CP059040,1300706,A,snp,G,G,G,G,synonymous_variant c.351A>G p.Ser117Ser,,,,,,H0N29_06100,nan
CP059040,1970200,T,snp,C,C,C,C,missense_variant c.281T>C p.Val94Ala,,,,,,H0N29_09260,nan
CP059040,2383727,A,ins,AT,AT,AT,AT,nan,,,,,,nan,nan
CP059040,2477628,T,ins,TA,TA,TA,TA,nan,,,,,,nan,nan
CP059040,2525852,A,ins,AT,AT,AT,AT,frameshift_variant c.529dupA p.Ile177fs,,,,,,H0N29_11845,nan
CP059040,3016359,A,ins,AT,AT,AT,AT,frameshift_variant c.2136dupA p.Phe713fs,,,,,,H0N29_14350,cas3f
CP059040,3111299,A,snp,G,G,G,G,synonymous_variant c.957T>C p.Ile319Ile,,,,,,H0N29_14775,nan
CP059040,3124917,T,snp,T,T,C,T,nan,,,,,,nan,nan
CP059040,3310021,C,ins,CT,CT,CT,CT,nan,,,,,,nan,nan
CP059040,3542741,GC,del,G,G,G,G,frameshift_variant c.1746delC p.Arg583fs,,,,,,H0N29_16865,nan
CP059040,3542934,CT,del,C,C,C,C,frameshift_variant c.1938delT p.Met647fs,,,,,,H0N29_16865,nan
CP059040,3570717,AC,del,A,A,A,A,frameshift_variant c.187delC p.Gln63fs,,,,,,H0N29_16960,nan
CP059040,3629616,C,ins,CT,CT,CT,CT,nan,,,,,,nan,nan
CP059040,3873573,A,snp,G,G,G,G,missense_variant c.1705A>G p.Asn569Asp,,,,,,H0N29_18380,nan
CP059040,1527276,TTGAACC,del,TTGAACC,T,TTGAACC,TTGAACC,conservative_inframe_deletion c.1327_1332delGAACCT p.Glu443_Pro444del,,,,,,H0N29_07175,nan

Deleted: *CP059040,3124917,T,snp,T,C,nan,,,,,,nan,nan
* 1527276

# #python3 ~/Scripts/merge_snps_indels.py bacto_DNAseq/snippy/summary_snps_indels.csv spandx/Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated.txt merged_variants.csv
# python3 ~/Scripts/merge_snps_indels.py bacto_DNAseq/snippy/summary_snps_indels.csv spandx/Outputs/Phylogeny_and_annotation/f1_7___ merged_variants.csv
# #check if the number of the output file is correct?
# comm -12 <(cut -d, -f2 bacto_DNAseq/snippy/summary_snps_indels.csv | sort | uniq) <(cut -f2 spandx/Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated.txt | sort | uniq) | wc -l  #26
# comm -12 <(cut -d, -f2 bacto_DNAseq/snippy/summary_snps_indels.csv | sort | uniq) <(cut -f2 spandx/Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated.txt | sort | uniq)

5. Structural Variant Detection

Structural variants are identified by assembly-to-reference comparison using:

  • MUMmer / nucmer
  • delta-filter
  • Assemblytics

Example workflow:

Original commands

# install Environment

mamba create -n sv_assembly \
  python=3.10 \
  minimap2 \
  mummer4 \
  samtools \
  bcftools \
  syri \
  assemblytics \
  -c conda-forge -c bioconda

mamba activate sv_assembly
mamba install numpy pandas -c conda-forge

which paftools.js

minimap2 --version
nucmer --version
syri -h
Assemblytics

#mamba install mummerplot igv -c bioconda

# 4 Tools, only the 4th works

#Scenario   Tool
#reads vs reference Sniffles / SVIM --> NOT WORKING
#assembly vs reference  SyRI (best) --> NOT WORKING
#quick & simple minimap2 + paftools --> NOT WORKING
#classic SV detection   Assemblytics --> WORKING (see below)

#--> Choose option 4: Assemblytics (classic SV tool for assemblies)

mamba activate sv_assembly

# ---------------- A6WT vs CP059040 (-->Nothing) ----------------

#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/A6WT/contigs.fa -p A6WT
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/A6WT_chr_plasmids.fasta -p A6WT
delta-filter -1 -q A6WT.delta > A6WT.filtered.delta
Assemblytics A6WT.filtered.delta A6WT_assemblytics 1000 100 50000
grep -w "Insertion" A6WT_assemblytics.Assemblytics_structural_variants.bed > A6WT_insertions.bed
wc -l A6WT_assemblytics.Assemblytics_structural_variants.bed
cut -f4 A6WT_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
cat A6WT_assemblytics.Assemblytics_assembly_stats.txt
show-coords -rcl A6WT.filtered.delta | head -20
Assemblytics A6WT.filtered.delta A6WT_assemblytics_v2 500 50 100000

CP059040    216043  226883  Assemblytics_b_1    10739   +   Repeat_contraction  10840   101 CP059040_RagTag:192505-192606:+ between_alignments
CP059040    491041  496665  Assemblytics_b_2    5523    +   Repeat_contraction  5624    101 CP059040_RagTag:456766-456867:+ between_alignments
CP059040    523243  528864  Assemblytics_b_3    5520    +   Repeat_contraction  5621    101 CP059040_RagTag:483445-483546:+ between_alignments
CP059040    785771  794627  Assemblytics_b_4    8755    +   Repeat_contraction  8856    101 CP059040_RagTag:740453-740554:+ between_alignments
CP059040    1327560 1328441 Assemblytics_b_5    780 +   Repeat_contraction  881 101 CP059040_RagTag:1273488-1273589:+   between_alignments
CP059040    2190218 2191651 Assemblytics_b_7    1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2135360-2135461:+   between_alignments
CP059040    2259736 2260384 Assemblytics_b_8    135 +   Tandem_contraction  -648    -783    CP059040_RagTag:2203411-2204194:-   between_alignments
CP059040    2477722 2488562 Assemblytics_b_9    10739   +   Repeat_contraction  10840   101 CP059040_RagTag:2421405-2421506:+   between_alignments
CP059040    2579725 2581158 Assemblytics_b_10   1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2512670-2512771:+   between_alignments
CP059040    2810861 2863470 Assemblytics_b_11   52668   +   Tandem_contraction  52609   -59 CP059040_RagTag:2742415-2742474:-   between_alignments
CP059040    3089638 3090457 Assemblytics_b_12   718 +   Repeat_contraction  819 101 CP059040_RagTag:2968584-2968685:+   between_alignments
CP059040    3124916 3125037 Assemblytics_b_13   198 +   Tandem_contraction  121 -77 CP059040_RagTag:3003067-3003144:-   between_alignments
CP059040    3282693 3288043 Assemblytics_b_14   5249    +   Repeat_contraction  5350    101 CP059040_RagTag:3160723-3160824:+   between_alignments
CP059040    3642789 3643608 Assemblytics_b_15   718 +   Repeat_contraction  819 101 CP059040_RagTag:3515569-3515670:+   between_alignments
CP059040    3732412 3737994 Assemblytics_b_16   5481    +   Repeat_contraction  5582    101 CP059040_RagTag:3604474-3604575:+   between_alignments
CP059040    3881974 3882855 Assemblytics_b_17   780 +   Repeat_contraction  881 101 CP059040_RagTag:3748555-3748656:+   between_alignments
CP059040    3942815 3948161 Assemblytics_b_18   5245    +   Repeat_contraction  5346    101 CP059040_RagTag:3808616-3808717:+   between_alignments

# ---------------- A10CraA vs CP059040 (1157 nt deletion) ----------------

#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/A10CraA_clean.fasta -p A10CraA
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/A10CraA_chr_plasmids.fasta -p A10CraA
delta-filter -1 -q A10CraA.delta > A10CraA.filtered.delta
Assemblytics A10CraA.filtered.delta A10CraA_assemblytics 1000 100 50000
grep -w "Insertion" A10CraA_assemblytics.Assemblytics_structural_variants.bed > A10CraA_insertions.bed
wc -l A10CraA_assemblytics.Assemblytics_structural_variants.bed
cut -f4 A10CraA_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
cat A10CraA_assemblytics.Assemblytics_assembly_stats.txt
show-coords -rcl A10CraA.filtered.delta | head -20
Assemblytics A10CraA.filtered.delta A10CraA_assemblytics_v2 500 50 100000

#reference      ref_start       ref_stop        ID      size    strand  type    ref_gap_size    query_gap_size  query_coordinates       method
#CP059040        2810861 2863470 Assemblytics_b_2        52668   +       Tandem_contraction      52609   -59     contig00003:229645-229704:-     between_alignments
#CP059040        3124916 3125037 Assemblytics_b_3        198     +       Tandem_contraction      121     -77     contig00009:157580-157657:-     between_alignments
CP059040        364652  365809  Assemblytics_b_4        1154    +       Deletion        1157    3       contig00011:137770-137773:+     between_alignments

#reference  ref_start   ref_stop    ID  size    strand  type    ref_gap_size    query_gap_size  query_coordinates   method
CP059040    216043  226883  Assemblytics_b_1    10739   +   Repeat_contraction  10840   101 CP059040_RagTag:192505-192606:+ between_alignments
* CP059040  364652  365809  Assemblytics_b_2    1154    +   Deletion    1157    3   CP059040_RagTag:330375-330378:+ between_alignments
CP059040    414041  414151  Assemblytics_b_3    211 +   Tandem_expansion    -110    101 CP059040_RagTag:378720-378821:+ between_alignments
CP059040    465315  465425  Assemblytics_b_4    211 +   Tandem_expansion    -110    101 CP059040_RagTag:430205-430306:+ between_alignments
CP059040    491041  496665  Assemblytics_b_5    5523    +   Repeat_contraction  5624    101 CP059040_RagTag:456034-456135:+ between_alignments
CP059040    523243  528864  Assemblytics_b_6    5520    +   Repeat_contraction  5621    101 CP059040_RagTag:482713-482814:+ between_alignments
CP059040    785771  794627  Assemblytics_b_7    8755    +   Repeat_contraction  8856    101 CP059040_RagTag:739721-739822:+ between_alignments
CP059040    1327560 1328441 Assemblytics_b_8    780 +   Repeat_contraction  881 101 CP059040_RagTag:1272756-1272857:+   between_alignments
CP059040    2190218 2191651 Assemblytics_b_10   1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2134628-2134729:+   between_alignments
CP059040    2477722 2488562 Assemblytics_b_11   10739   +   Repeat_contraction  10840   101 CP059040_RagTag:2420802-2420903:+   between_alignments
CP059040    2579725 2581158 Assemblytics_b_12   1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2512067-2512168:+   between_alignments
CP059040    2810861 2863470 Assemblytics_b_13   52668   +   Tandem_contraction  52609   -59 CP059040_RagTag:2741812-2741871:-   between_alignments
CP059040    3089638 3090457 Assemblytics_b_14   718 +   Repeat_contraction  819 101 CP059040_RagTag:2967981-2968082:+   between_alignments
CP059040    3124916 3125037 Assemblytics_b_15   198 +   Tandem_contraction  121 -77 CP059040_RagTag:3002464-3002541:-   between_alignments
CP059040    3282693 3288043 Assemblytics_b_16   5249    +   Repeat_contraction  5350    101 CP059040_RagTag:3160120-3160221:+   between_alignments
CP059040    3642789 3643608 Assemblytics_b_17   718 +   Repeat_contraction  819 101 CP059040_RagTag:3514966-3515067:+   between_alignments
CP059040    3732412 3737887 Assemblytics_b_18   5374    +   Repeat_contraction  5475    101 CP059040_RagTag:3603871-3603972:+   between_alignments
CP059040    3881974 3882855 Assemblytics_b_19   780 +   Repeat_contraction  881 101 CP059040_RagTag:3748059-3748160:+   between_alignments
CP059040    3942815 3948161 Assemblytics_b_20   5245    +   Repeat_contraction  5346    101 CP059040_RagTag:3808120-3808221:+   between_alignments

     gene            364609..365838
                     /locus_tag="craA"

     CDS             364609..365838
                     /locus_tag="H0N29_01695"
                     /inference="COORDINATES: similar to AA
                     sequence:RefSeq:YP_004996877.1"
                     /note="Derived by automated computational analysis using
                     gene prediction method: Protein Homology."
                     /codon_start=1
                     /transl_table=11
                     /product="MFS transporter"
                     /protein_id="QNT85352.1"
                     /db_xref="GI:1906908782"
                     /translation="MKNIQTTALNRTTLMFPLALVLFEFAVYIGNDLIQPAMLAITEDFGVSATWAPSSMSFYLLGGASVAWLLGPLSDRLGRKKVLLSGVLFFALCCFLILLTRQIEHFLTLRFLQGIGLSVISAVGYAAIQENFAERDAIKVMALMANISLLAPLLGPVLGAFLIDYVSWHWGFVAIALLALLSWVGLKKQMPSHKVSVTKQPFSYLFDDFKKVFSNRQFLGLTLALPLVGMPLMLWIALSPIILVDELKLTSVQYGLAQFPVFLGLIVGNIVLIKIIDRLALGKTVLIGLPIMLTGTLILILGVVWQAYLIPCLLIGMTLICFGEGISFSVLYRFALMSSEVSKGTVAAAVSMLLMTSFFAMIELVRYLYTQFHLWAFVLSAFAFIALWFTQPRLALKREMQERVAQDLH"
        chloramphenicol efflux MFS transporter CraA [Acinetobacter pittii]
        Sequence ID: WP_016142916.1Length: 409Number of Matches: 1

# ---------------- 19606adeAB vs CP059040 (4282 nt deletion) ----------------

# Step 1: Align assemblies to reference
#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/19606adeAB/contigs.fa -p 19606adeAB
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/adeAB_chr_plasmids.fasta -p 19606adeAB
# Step 2: Filter alignments (1-to-1 best matches)
delta-filter -1 -q 19606adeAB.delta > 19606adeAB.filtered.delta
# Note: Use -1 for 1-to-1, not -r -q
# Step 3: Run Assemblytics with ALL 5 parameters
Assemblytics 19606adeAB.filtered.delta 19606adeAB_assemblytics 1000 100 50000
# Step 5: Extract large insertions only
grep -w "Insertion" 19606adeAB_assemblytics.Assemblytics_structural_variants.bed > 19606adeAB_insertions.bed
# 6. Check if ANY variants were detected (any size)
wc -l 19606adeAB_assemblytics.Assemblytics_structural_variants.bed
# 7. View variant type distribution
cut -f4 19606adeAB_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
# 8. Check alignment coverage (are contigs aligning well?)
cat 19606adeAB_assemblytics.Assemblytics_assembly_stats.txt
# 9. Check raw delta file for alignment blocks
show-coords -rcl 19606adeAB.filtered.delta | head -20
# 10. If bed file is empty, try relaxing parameters and re-run:
Assemblytics 19606adeAB.filtered.delta 19606adeAB_assemblytics_v2 500 50 100000
#                          └─unique─┘ └min┘ └──max──┘

CP059040        2810861 2863470 Assemblytics_b_1        52668   +       Tandem_contraction      52609   -59     contig00005:229645-229704:-     between_alignments
CP059040        3124916 3125037 Assemblytics_b_2        198     +       Tandem_contraction      121     -77     contig00012:92096-92173:-       between_alignments
3124917
--> DELETE!
CP059040        1844323 1848605 Assemblytics_b_5        4282    +       Deletion        4282    0       contig00024:26693-26693:+       between_alignments

#reference  ref_start   ref_stop    ID  size    strand  type    ref_gap_size    query_gap_size  query_coordinates   method
CP059040    216043  226883  Assemblytics_b_1    10739   +   Repeat_contraction  10840   101 CP059040_RagTag:192505-192606:+ between_alignments
CP059040    375124  375138  Assemblytics_b_2    115 +   Insertion   -14 101 CP059040_RagTag:340861-340962:+ between_alignments
CP059040    491041  496665  Assemblytics_b_5    5523    +   Repeat_contraction  5624    101 CP059040_RagTag:456894-456995:+ between_alignments
CP059040    523243  528864  Assemblytics_b_6    5520    +   Repeat_contraction  5621    101 CP059040_RagTag:483573-483674:+ between_alignments
CP059040    785771  794627  Assemblytics_b_8    8755    +   Repeat_contraction  8856    101 CP059040_RagTag:740606-740707:+ between_alignments
CP059040    1327560 1328441 Assemblytics_b_9    780 +   Repeat_contraction  881 101 CP059040_RagTag:1273641-1273742:+   between_alignments
CP059040    1607031 1607049 Assemblytics_b_12   83  +   Insertion   18  101 CP059040_RagTag:1552326-1552427:+   between_alignments
* CP059040  1844323 1848605 Assemblytics_b_14   4282    +   Deletion    4282    0   CP059040_RagTag:1789745-1789745:+   between_alignments
CP059040    1852044 1852303 Assemblytics_b_15   158 +   Repeat_contraction  259 101 CP059040_RagTag:1793184-1793285:+   between_alignments
CP059040    2147239 2147252 Assemblytics_b_17   114 +   Insertion   -13 101 CP059040_RagTag:2088243-2088344:+   between_alignments
CP059040    2178043 2178056 Assemblytics_b_18   114 +   Insertion   -13 101 CP059040_RagTag:2119161-2119262:+   between_alignments
CP059040    2180008 2180010 Assemblytics_b_19   99  +   Insertion   2   101 CP059040_RagTag:2121227-2121328:+   between_alignments
CP059040    2190218 2191651 Assemblytics_b_20   1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2131536-2131637:+   between_alignments
CP059040    2488562 2488672 Assemblytics_b_22   211 +   Tandem_expansion    -110    101 CP059040_RagTag:2428616-2428717:+   between_alignments
CP059040    2579725 2581158 Assemblytics_b_23   1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2519881-2519982:+   between_alignments
CP059040    2810861 2863470 Assemblytics_b_24   52668   +   Tandem_contraction  52609   -59 CP059040_RagTag:2749626-2749685:-   between_alignments
CP059040    2873400 2873644 Assemblytics_b_25   143 +   Repeat_contraction  244 101 CP059040_RagTag:2759556-2759657:+   between_alignments
CP059040    3089638 3090457 Assemblytics_b_26   718 +   Repeat_contraction  819 101 CP059040_RagTag:2975652-2975753:+   between_alignments
CP059040    3124916 3125037 Assemblytics_b_27   198 +   Tandem_contraction  121 -77 CP059040_RagTag:3010135-3010212:-   between_alignments
CP059040    3217209 3217578 Assemblytics_b_28   268 +   Repeat_contraction  369 101 CP059040_RagTag:3102307-3102408:+   between_alignments
CP059040    3282677 3288043 Assemblytics_b_29   5265    +   Repeat_contraction  5366    101 CP059040_RagTag:3167507-3167608:+   between_alignments
CP059040    3642789 3643608 Assemblytics_b_30   718 +   Repeat_contraction  819 101 CP059040_RagTag:3522353-3522454:+   between_alignments
CP059040    3732428 3737887 Assemblytics_b_31   5358    +   Repeat_contraction  5459    101 CP059040_RagTag:3611274-3611375:+   between_alignments
CP059040    3881974 3882855 Assemblytics_b_33   780 +   Repeat_contraction  881 101 CP059040_RagTag:3755461-3755562:+   between_alignments
CP059040    3942815 3948161 Assemblytics_b_34   5245    +   Repeat_contraction  5346    101 CP059040_RagTag:3815522-3815623:+   between_alignments

     gene            1844319..1845509
                     /gene="adeA"
                     /locus_tag="H0N29_08675"
     gene            1845506..1848616
                     /gene="adeB"
                     /locus_tag="H0N29_08680"

# Step 1: Align assemblies to reference
nucmer --maxmatch -l 100 -c 500 shovill/A6WT/contigs.fa shovill/19606adeAB/contigs.fa -p 19606adeAB_vs_A6WT

# Step 2: Filter alignments (1-to-1 best matches)
delta-filter -1 -q 19606adeAB_vs_A6WT.delta > 19606adeAB_vs_A6WT.filtered.delta
# Note: Use -1 for 1-to-1, not -r -q

# Step 3: Run Assemblytics with ALL 5 parameters
Assemblytics 19606adeAB_vs_A6WT.filtered.delta 19606adeAB_vs_A6WT_assemblytics 1000 100 50000

# Step 4: Extract large insertions only
grep -w "Insertion" 19606adeAB_vs_A6WT_assemblytics.Assemblytics_structural_variants.bed > 19606adeAB_vs_A6WT_insertions.bed

# Step 5. Check if ANY variants were detected (any size)
wc -l 19606adeAB_vs_A6WT_assemblytics.Assemblytics_structural_variants.bed

# Step 6. View variant type distribution
cut -f4 19606adeAB_vs_A6WT_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c

# Step 7. Check alignment coverage (are contigs aligning well?)
cat 19606adeAB_vs_A6WT_assemblytics.Assemblytics_assembly_stats.txt

# Step 8. Check raw delta file for alignment blocks
show-coords -rcl 19606adeAB_vs_A6WT.filtered.delta | head -20

# Step 9. If bed file is empty, try relaxing parameters and re-run:
Assemblytics 19606adeAB_vs_A6WT.filtered.delta 19606adeAB_vs_A6WT_assemblytics_v2 500 50 100000
#                          └─unique─┘ └min┘ └──max──┘

# ---------------- adeIJ vs CP059040 (4436 nt deletion) ----------------

#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/adeIJ/contigs.fa -p adeIJ
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/adeIJ_chr_plasmids.fasta -p adeIJ
delta-filter -1 -q adeIJ.delta > adeIJ.filtered.delta
Assemblytics adeIJ.filtered.delta adeIJ_assemblytics 1000 100 50000
grep -w "Insertion" adeIJ_assemblytics.Assemblytics_structural_variants.bed > adeIJ_insertions.bed
wc -l adeIJ_assemblytics.Assemblytics_structural_variants.bed
cut -f4 adeIJ_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
cat adeIJ_assemblytics.Assemblytics_assembly_stats.txt
show-coords -rcl adeIJ.filtered.delta | head -20
Assemblytics adeIJ.filtered.delta adeIJ_assemblytics_v2 500 50 100000

#CP059040        2810861 2863470 Assemblytics_b_2        52668   +       Tandem_contraction      52609   -59     contig00003:229645-229704:-     between_alignments
#CP059040        2259736 2260384 Assemblytics_b_3        135     +       Tandem_contraction      -648    -783    contig00005:217212-217995:-     between_alignments
CP059040        737224  741667  Assemblytics_b_4        4436    +       Deletion        4443    7       contig00007:208361-208368:+     between_alignments
#CP059040        3124916 3125037 Assemblytics_b_5        198     +       Tandem_contraction      121     -77     contig00009:157580-157657:-     between_alignments

CP059040    216043  226883  Assemblytics_b_1    10739   +   Repeat_contraction  10840   101 CP059040_RagTag:192505-192606:+ between_alignments
CP059040    491041  496665  Assemblytics_b_2    5523    +   Repeat_contraction  5624    101 CP059040_RagTag:456766-456867:+ between_alignments
CP059040    523243  528864  Assemblytics_b_3    5520    +   Repeat_contraction  5621    101 CP059040_RagTag:483445-483546:+ between_alignments
* CP059040  737224  741667  Assemblytics_b_4    4436    +   Deletion    4443    7   CP059040_RagTag:691906-691913:+ between_alignments
CP059040    785771  794627  Assemblytics_b_5    8755    +   Repeat_contraction  8856    101 CP059040_RagTag:736017-736118:+ between_alignments
CP059040    1327560 1328441 Assemblytics_b_6    780 +   Repeat_contraction  881 101 CP059040_RagTag:1269052-1269153:+   between_alignments
CP059040    2190218 2191651 Assemblytics_b_8    1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2130924-2131025:+   between_alignments
CP059040    2259736 2260384 Assemblytics_b_9    135 +   Tandem_contraction  -648    -783    CP059040_RagTag:2198975-2199758:-   between_alignments
CP059040    2477722 2488562 Assemblytics_b_10   10739   +   Repeat_contraction  10840   101 CP059040_RagTag:2416969-2417070:+   between_alignments
CP059040    2579725 2581158 Assemblytics_b_11   1332    +   Repeat_contraction  1433    101 CP059040_RagTag:2508234-2508335:+   between_alignments
CP059040    3089638 3090457 Assemblytics_b_13   718 +   Repeat_contraction  819 101 CP059040_RagTag:2964148-2964249:+   between_alignments
CP059040    3124916 3125037 Assemblytics_b_14   198 +   Tandem_contraction  121 -77 CP059040_RagTag:2998631-2998708:-   between_alignments
CP059040    3282693 3288043 Assemblytics_b_15   5249    +   Repeat_contraction  5350    101 CP059040_RagTag:3156287-3156388:+   between_alignments
CP059040    3642789 3643608 Assemblytics_b_16   718 +   Repeat_contraction  819 101 CP059040_RagTag:3511133-3511234:+   between_alignments
CP059040    3732412 3737994 Assemblytics_b_17   5481    +   Repeat_contraction  5582    101 CP059040_RagTag:3600038-3600139:+   between_alignments
CP059040    3881974 3882855 Assemblytics_b_18   780 +   Repeat_contraction  881 101 CP059040_RagTag:3744119-3744220:+   between_alignments
CP059040    3942815 3948161 Assemblytics_b_19   5245    +   Repeat_contraction  5346    101 CP059040_RagTag:3804180-3804281:+   between_alignments
CP059040    788949  794737  Assemblytics_b_27   5889    +   Tandem_expansion    -5788   101 CP059040_RagTag:3876490-3876591:+   between_alignments

     gene            complement(737233..740409)
                     /gene="adeJ"
                     /locus_tag="H0N29_03545"
     gene            complement(740422..741672)
                     /gene="adeI"
                     /locus_tag="H0N29_03550"

6. Confirmed Targeted Deletions

ΔadeAB

  • 4,282-bp deletion
  • Corresponds to CP059040 positions 1,844,323–1,848,605
  • Removes nearly the entire adeAB operon:

    • adeA (H0N29_08675; 1,844,319–1,845,509)
    • adeB (H0N29_08680; 1,845,506–1,848,616)

ΔadeIJ

  • 4,443-bp deletion
  • Corresponds to CP059040 positions 737,224–741,667
  • Removes the reverse-strand adeIJ operon:

    • adeJ (H0N29_03545; complement 737,233–740,409)
    • adeI (H0N29_03550; complement 740,422–741,672)

ΔcraA

  • 1,157-bp deletion
  • Corresponds to CP059040 positions 364,652–365,809
  • Removes the central portion of craA (H0N29_01695; 364,609–365,838)

Together, the structural variant and SNP/indel analyses confirm that each mutant differs from WT only at the intended deletion locus.


7. Manuscript-Ready Summary

Whole-genome sequencing and comparative assembly analysis confirmed the genomic integrity of the ΔadeAB, ΔadeIJ, and ΔcraA mutants. Structural variant analysis identified a single targeted deletion in each strain, while consensus SNP/indel calling using Snippy and SPANDx detected no additional variants relative to WT outside the engineered loci.