This workflow describes the assembly, scaffolding, variant calling, and structural variant validation used to confirm the genomic integrity of A. baumannii ATCC19606 WT and the ΔadeAB, ΔadeIJ, and ΔcraA mutants.
1. Raw Read Preparation and Assembly
Create project structure
mkdir bacto_DNAseq
cd bacto_DNAseq
mkdir raw_data
cd raw_data
Link raw FASTQ files
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_1.fq.gz 19606adeAB_R1.fastq.gz
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_2.fq.gz 19606adeAB_R2.fastq.gz
# Additional files present in the project
A10CraA_R1.fastq.gz
A10CraA_R2.fastq.gz
A6WT_R1.fastq.gz
A6WT_R2.fastq.gz
adeIJ_R1.fastq.gz
adeIJ_R2.fastq.gz
Install and run the bacto_DNAseq pipeline including assembly
git clone https://github.com/huang/bacto_DNAseq
mv bacto_DNAseq/* ./
rm -rf bacto_DNAseq
conda activate /home/jhuang/miniconda3/envs/bengal3_ac3
snakemake --printshellcmds
Notes
-
Edit
bacto_DNAseq-0.1.jsonto enable only:assemblytyping_mlst- optionally
pangenome variants_calling
- The pipeline requires access to:
/media/jhuang/Titisee/GAMOLA2/TIGRfam_db/TIGRFAMs_15.0_HMM.LIB
Original commands
# ---------------------------- Assembly using bacto ----------------------------
mkdir bacto_DNAseq; cd bacto_DNAseq;
mkdir raw_data; cd raw_data;
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_1.fq.gz 19606adeAB_R1.fastq.gz
ln -s ../../X101SC26025981-Z02-J001/01.RawData/19606_adeAB/19606_adeAB_2.fq.gz 19606adeAB_R2.fastq.gz
./A10CraA_R1.fastq.gz
./A10CraA_R2.fastq.gz
./A6WT_R1.fastq.gz
./A6WT_R2.fastq.gz
./adeIJ_R1.fastq.gz
./adeIJ_R2.fastq.gz
git clone https://github.com/huang/bacto_DNAseq
mv bacto_DNAseq/* ./
rm -rf bacto_DNAseq
conda activate /home/jhuang/miniconda3/envs/bengal3_ac3
(bengal3_ac3) jhuang@WS-2290C:~/DATA/Data_Tam_DNAseq_2023_A6WT_A10CraA_A12AYE_A1917978$ which snakemake
/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snakemake
(bengal3_ac3) jhuang@WS-2290C:~/DATA/Data_Tam_DNAseq_2023_A6WT_A10CraA_A12AYE_A1917978$ snakemake -v
4.0.0 --> CORRECT!
#NOTE_1: modify bacto_DNAseq-0.1.json keeping only steps assembly, typing_mlst, possibly pangenome and variants_calling true!
#NOTE_2: needs disk Titisee since the pipeline needs /media/jhuang/Titisee/GAMOLA2/TIGRfam_db/TIGRFAMs_15.0_HMM.LIB
snakemake --printshellcmds
2. Contig Filtering and Chromosome Scaffolding
After SPAdes assembly, contigs shorter than 500 bp are removed:
seqkit seq -m 500 A6WT/contigs.fa > A6WT_contigs.min500.fasta
seqkit seq -m 500 19606adeAB/contigs.fa > adeAB_contigs.min500.fasta
seqkit seq -m 500 A10CraA/contigs.fa > A10CraA_contigs.min500.fasta
seqkit seq -m 500 adeIJ/contigs.fa > adeIJ_contigs.min500.fasta
Chromosomal contigs are then identified by alignment against the reference genome CP059040.fasta using minimap2. Contigs lacking alignment are interpreted as putative plasmids and excluded from scaffolding.
WT
Excluded plasmid contig:
contig00016
ΔadeAB
Excluded plasmid contigs:
contig00029
contig00030
contig00033
contig00039
ΔcraA
Excluded plasmid contig:
contig00096
ΔadeIJ
Excluded plasmid contigs:
contig00017
contig00019
contig00020
contig00021
contig00025
Scaffold chromosome using RagTag
Example for ΔadeAB:
ragtag.py scaffold CP059040.fasta adeAB_contigs.min500.no29_30_33_39.fasta -o ragtag_adeAB -C
The scaffolded chromosome is concatenated with excluded plasmid contigs to generate the final assembly.
Original commands
# ----------------------------- Scaffolding ------------------------------
cd shovill
seqkit seq -m 500 contigs.fa > contigs.min500.fasta
#seqkit seq -g -m 500 contigs.fa > contigs.min500_g.fasta
#For project 2:
seqkit seq -m 500 adeABadeIJ_contigs.fa > adeABadeIJ_contigs.min500.fasta
seqkit seq -m 500 adeIJK_contigs.fa > adeIJK_contigs.min500.fasta
#For project 1:
seqkit seq -m 500 A6WT/contigs.fa > A6WT_contigs.min500.fasta
seqkit seq -m 500 19606adeAB/contigs.fa > adeAB_contigs.min500.fasta
seqkit seq -m 500 A10CraA/contigs.fa > A10CraA_contigs.min500.fasta
seqkit seq -m 500 adeIJ/contigs.fa > adeIJ_contigs.min500.fasta
# NOT_NEED_ANYMORE: Perform online scaffolding with Multi-CSAR v1.1 (https://genome.cs.nthu.edu.tw/Multi-CSAR/) --> Using new methods minimap2 + RagTag!
#2
adeABadeIJ 29 contigs
adeIJK 22 contigs
#1
A6WT 22 contigs -1
adeAB 40 contigs -4
adeIJ 27 contigs -5
A10CraA 24 contigs -1
#2
minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeABadeIJ_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00020
#-->contig00021
#-->contig00027
seqkit grep -v -r \
-p "^contig00020([[:space:]]|$)" \
-p "^contig00021([[:space:]]|$)" \
-p "^contig00027([[:space:]]|$)" \
adeABadeIJ_contigs.min500.fasta > adeABadeIJ_contigs.min500.no20_21_27.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeABadeIJ_contigs.min500.no20_21_27.fasta -o ragtag_adeABadeIJ -C
minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeIJK_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00016
seqkit grep -v -r -p "^contig00016(\s|$)" adeIJK_contigs.min500.fasta > adeIJK_contigs.min500.no16.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeIJK_contigs.min500.no16.fasta -o ragtag_adeIJK -C
#1
(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta A6WT_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00016
seqkit grep -v -r -p "^contig00016(\s|$)" A6WT_contigs.min500.fasta > A6WT_contigs.min500.no16.fasta
seqkit grep -r -p "^contig00016(\s|$)" A6WT_contigs.min500.fasta > A6WT_contigs.min500.16.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta A6WT_contigs.min500.no16.fasta -o ragtag_A6WT -C
cat ragtag_A6WT/ragtag.scaffold.fasta A6WT_contigs.min500.16.fasta > A6WT_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' A6WT_chr_plasmids_.fasta > A6WT_chr_plasmids__.fasta
seqkit seq A6WT_chr_plasmids__.fasta > A6WT_chr_plasmids.fasta
(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeAB_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00029
#-->contig00030
#-->contig00033
#-->contig00039
seqkit grep -v -r \
-p "^contig00029([[:space:]]|$)" \
-p "^contig00030([[:space:]]|$)" \
-p "^contig00033([[:space:]]|$)" \
-p "^contig00039([[:space:]]|$)" \
adeAB_contigs.min500.fasta > adeAB_contigs.min500.no29_30_33_39.fasta
seqkit grep -r \
-p "^contig00029([[:space:]]|$)" \
-p "^contig00030([[:space:]]|$)" \
-p "^contig00033([[:space:]]|$)" \
-p "^contig00039([[:space:]]|$)" \
adeAB_contigs.min500.fasta > adeAB_contigs.min500.29_30_33_39.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeAB_contigs.min500.no29_30_33_39.fasta -o ragtag_adeAB -C
cat ragtag_adeAB/ragtag.scaffold.fasta adeAB_contigs.min500.29_30_33_39.fasta > adeAB_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' adeAB_chr_plasmids_.fasta > adeAB_chr_plasmids__.fasta
seqkit seq adeAB_chr_plasmids__.fasta > adeAB_chr_plasmids.fasta
samtools faidx adeAB_chr_plasmids.fasta
(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta A10CraA_clean.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#-->contig00096
seqkit grep -v -r \
-p "^contig00096([[:space:]]|$)" \
A10CraA_clean.fasta > A10CraA_contigs.min500.no96.fasta
seqkit grep -r \
-p "^contig00096([[:space:]]|$)" \
A10CraA_clean.fasta > A10CraA_contigs.min500.96.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta A10CraA_contigs.min500.no96.fasta -o ragtag_A10CraA -C
cat ragtag_A10CraA/ragtag.scaffold.fasta A10CraA_contigs.min500.96.fasta > A10CraA_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' A10CraA_chr_plasmids_.fasta > A10CraA_chr_plasmids__.fasta
seqkit seq A10CraA_chr_plasmids__.fasta > A10CraA_chr_plasmids.fasta
(ragtag_env) minimap2 -cx asm20 --paf-no-hit ../CP059040.fasta adeIJ_contigs.min500.fasta > asm20_all.paf
awk '$6=="*"{print $1}' asm20_all.paf
#contig00017
#contig00019
#contig00020
#contig00021
#contig00025
seqkit grep -v -r \
-p "^contig00017([[:space:]]|$)" \
-p "^contig00019([[:space:]]|$)" \
-p "^contig00020([[:space:]]|$)" \
-p "^contig00021([[:space:]]|$)" \
-p "^contig00025([[:space:]]|$)" \
adeIJ_contigs.min500.fasta > adeIJ_contigs.min500.no17_19_20_21_25.fasta
seqkit grep -r \
-p "^contig00017([[:space:]]|$)" \
-p "^contig00019([[:space:]]|$)" \
-p "^contig00020([[:space:]]|$)" \
-p "^contig00021([[:space:]]|$)" \
-p "^contig00025([[:space:]]|$)" \
adeIJ_contigs.min500.fasta > adeIJ_contigs.min500.17_19_20_21_25.fasta
(ragtag_env) ragtag.py scaffold ../CP059040.fasta adeIJ_contigs.min500.no17_19_20_21_25.fasta -o ragtag_adeIJ -C
cat ragtag_adeIJ/ragtag.scaffold.fasta adeIJ_contigs.min500.17_19_20_21_25.fasta > adeIJ_chr_plasmids_.fasta
sed 's/^>Chr0_RagTag$/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' adeIJ_chr_plasmids_.fasta > adeIJ_chr_plasmids__.fasta
seqkit seq adeIJ_chr_plasmids__.fasta > adeIJ_chr_plasmids.fasta
samtools faidx adeIJ_chr_plasmids.fasta
3. Final FASTA Header Format for NCBI Submission
Example headers:
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00029 [plasmid-name=pAdeAB1] [topology=circular] [completeness=partial]
Important: completeness=incomplete is not accepted by NCBI and must be replaced with:
completeness=partial
Automatic correction:
sed -i 's/completeness=incomplete/completeness=partial/g' *.fasta
Original commands
# IUPUT assembled and scaffolded files
#./shovill/A6WT_chr_plasmids.fasta
#./shovill/A10CraA_chr_plasmids.fasta
#./shovill/adeAB_chr_plasmids.fasta
#./shovill/adeIJ_chr_plasmids.fasta
# 备份原文件
cp A6WT_chr_plasmids.fasta A6WT_chr_plasmids.fasta.backup
# 替换错误的 completeness=incomplete 为 completeness=partial
sed -i 's/completeness=incomplete/completeness=partial/g' A6WT_chr_plasmids.fasta
## 或者移除所有 topology 和 completeness 标签(最安全)
#sed -i 's/ \[topology=[^]]*\]//g' A6WT_chr_plasmids.fasta
#sed -i 's/ \[completeness=[^]]*\]//g' A6WT_chr_plasmids.fasta
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" A6WT_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00016 [plasmid-name=pWT1] [topology=circular] [completeness=partial]
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" A10CraA_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00096 [plasmid-name=pCraA1] [topology=circular] [completeness=partial]
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" adeAB_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00029 [plasmid-name=pAdeAB1] [topology=circular] [completeness=partial]
>contig00030 [plasmid-name=pAdeAB2] [topology=circular] [completeness=partial]
>contig00033 [plasmid-name=pAdeAB3] [topology=circular] [completeness=partial]
>contig00039 [plasmid-name=pAdeAB4] [topology=circular] [completeness=partial]
(bengal3_ac3) jhuang@WS-2290C:/mnt/md1/DATA/Data_Foong_RNAseq_2021_ATCC19606_Cm/bacto_DNAseq/shovill$ grep ">" adeIJ_chr_plasmids.fasta
>Chr [location=chromosome] [topology=circular] [completeness=partial]
>contig00017 [plasmid-name=pAdeIJ1] [topology=circular] [completeness=partial]
>contig00019 [plasmid-name=pAdeIJ2] [topology=circular] [completeness=partial]
>contig00020 [plasmid-name=pAdeIJ3] [topology=circular] [completeness=partial]
>contig00021 [plasmid-name=pAdeIJ4] [topology=circular] [completeness=partial]
>contig00025 [plasmid-name=pAdeIJ5] [topology=circular] [completeness=partial]
4. SNP and Small Indel Detection
Two independent pipelines are used:
- Snippy v4.6.0
- SPANDx v3.2
Only variants detected by both methods are retained.
Snippy summary
python3 ~/Scripts/summarize_snippy_res.py snippy
SPANDx run
nextflow run spandx/main.nf \
--fastq "*_P_{1,2}.fastq.gz" \
--ref CP059040.fasta \
--annotation \
--database CP059040 \
-resume
Merge final variant calls
python3 ~/Scripts/merge_snps_indels.py \
bacto_DNAseq/snippy/summary_snps_indels.csv \
spandx/Outputs/Phylogeny_and_annotation/f1_7___ \
merged_variants.csv
Result: no SNPs or small indels unique to any mutant outside the intended deletion loci.
Original commands
# ---------------------------- SNP+Indel using snippy ----------------------------
#Summarize all SNPs and Indels from the snippy result directory
#NOTE: need to adapt the isolate names in summarize_snippy_res.py
#Output: snippy_CP133676/snippy/summary_snps_indels.csv
#Aapt the sample names in ~/Scripts/summarize_snippy_res.py "A6WT", "A10CraA", "19606adeAB", "adeIJ"
python3 ~/Scripts/summarize_snippy_res.py snippy
#? in the record they are not 100% identical: CP059040,1527276,TTGAACC,del,TTGAACC,T,TTGAACC,TTGAACC,conservative_inframe_deletion c.1327_1332delGAACCT p.Glu443_Pro444del,,,,,,H0N29_07175,nan
# CP059040,3124917,T,snp,T,T,C,T,nan,,,,,,nan,nan --> gene AB!
# ---------------------------- SNP+Indel using spandx ----------------------------
mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040
cp CP059040.gb ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040/genes.gbk
vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
/home/jhuang/miniconda3/envs/spandx/bin/snpEff build CP059040 -d
mkdir spandx
cp bacto_DNAseq/trimmed/*_P_*fastq spandx
cd spandx
gzip A6WT_trimmed_P_1.fastq A6WT_trimmed_P_2.fastq 19606adeAB_trimmed_P_1.fastq 19606adeAB_trimmed_P_2.fastq ...
cp CP059040.fasta spandx
#grep ">" CP059040.fasta
#>CP059040
conda activate /home/jhuang/miniconda3/envs/spandx
ln -s /home/jhuang/Tools/spandx/ spandx
(spandx) nextflow run spandx/main.nf --fastq "*_P_{1,2}.fastq.gz" --ref CP059040.fasta --annotation --database CP059040 -resume
# ---------------------------- post-processing of Outputs since BUG: all indels and snps are annotated as MODIFIER --> taking only position, the annotation take we from snippy ----------------------
#-- _CP133676 produced by SPANDx (temporary not necessary) --
cd Outputs/Phylogeny_and_annotation
#awk '{if($3!=$7) print}' < All_SNPs_indels_annotated.txt > All_SNPs_indels_annotated_.txt
cut -d$'\t' -f1-7 All_SNPs_indels_annotated.txt > f1_7
grep -v "/" f1_7 > f1_7_
grep -v "\." f1_7_ > f1_7__
grep -v "*" f1_7__ > f1_7___ #(35 records)
# ---------------------------- merge the following two files summary_snps_indels.csv (192) and All_SNPs_indels_annotated.txt (248) –> merged_variants.csv (94) ----------------------------
#Note that the results for the project is from manually selected the common annotated snp and indels as bacto_DNAseq/snippy/summary_snps_indels.csv
CHROM,POS,REF,TYPE,A6WT,A10CraA,19606adeAB,adeIJ,Effect,Impact,Functional_Class,Codon_change,Protein_and_nucleotide_change,Amino_Acid_Length,Gene_name,Biotype
CP059040,136447,T,snp,A,A,A,A,nan,,,,,,nan,nan
CP059040,152318,T,snp,A,A,A,A,missense_variant c.106A>T p.Thr36Ser,,,,,,H0N29_00700,nan
CP059040,171965,T,snp,A,A,A,A,missense_variant c.167T>A p.Val56Asp,,,,,,H0N29_00780,nan
CP059040,194020,C,ins,CT,CT,CT,CT,frameshift_variant c.3260dupA p.Arg1088fs,,,,,,H0N29_00885,nan
CP059040,375297,A,snp,T,T,T,T,nan,,,,,,nan,nan
CP059040,468931,A,ins,AT,AT,AT,AT,nan,,,,,,nan,nan
CP059040,468976,A,ins,AT,AT,AT,AT,nan,,,,,,nan,nan
CP059040,609979,A,snp,T,T,T,T,stop_gained c.1009A>T p.Lys337*,,,,,,H0N29_02925,nan
CP059040,1036730,T,ins,TA,TA,TA,TA,nan,,,,,,nan,nan
CP059040,1059847,C,snp,T,T,T,T,missense_variant c.119G>A p.Gly40Asp,,,,,,H0N29_04880,nan
CP059040,1272639,T,snp,G,G,G,G,nan,,,,,,nan,nan
CP059040,1300706,A,snp,G,G,G,G,synonymous_variant c.351A>G p.Ser117Ser,,,,,,H0N29_06100,nan
CP059040,1970200,T,snp,C,C,C,C,missense_variant c.281T>C p.Val94Ala,,,,,,H0N29_09260,nan
CP059040,2383727,A,ins,AT,AT,AT,AT,nan,,,,,,nan,nan
CP059040,2477628,T,ins,TA,TA,TA,TA,nan,,,,,,nan,nan
CP059040,2525852,A,ins,AT,AT,AT,AT,frameshift_variant c.529dupA p.Ile177fs,,,,,,H0N29_11845,nan
CP059040,3016359,A,ins,AT,AT,AT,AT,frameshift_variant c.2136dupA p.Phe713fs,,,,,,H0N29_14350,cas3f
CP059040,3111299,A,snp,G,G,G,G,synonymous_variant c.957T>C p.Ile319Ile,,,,,,H0N29_14775,nan
CP059040,3124917,T,snp,T,T,C,T,nan,,,,,,nan,nan
CP059040,3310021,C,ins,CT,CT,CT,CT,nan,,,,,,nan,nan
CP059040,3542741,GC,del,G,G,G,G,frameshift_variant c.1746delC p.Arg583fs,,,,,,H0N29_16865,nan
CP059040,3542934,CT,del,C,C,C,C,frameshift_variant c.1938delT p.Met647fs,,,,,,H0N29_16865,nan
CP059040,3570717,AC,del,A,A,A,A,frameshift_variant c.187delC p.Gln63fs,,,,,,H0N29_16960,nan
CP059040,3629616,C,ins,CT,CT,CT,CT,nan,,,,,,nan,nan
CP059040,3873573,A,snp,G,G,G,G,missense_variant c.1705A>G p.Asn569Asp,,,,,,H0N29_18380,nan
CP059040,1527276,TTGAACC,del,TTGAACC,T,TTGAACC,TTGAACC,conservative_inframe_deletion c.1327_1332delGAACCT p.Glu443_Pro444del,,,,,,H0N29_07175,nan
Deleted: *CP059040,3124917,T,snp,T,C,nan,,,,,,nan,nan
* 1527276
# #python3 ~/Scripts/merge_snps_indels.py bacto_DNAseq/snippy/summary_snps_indels.csv spandx/Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated.txt merged_variants.csv
# python3 ~/Scripts/merge_snps_indels.py bacto_DNAseq/snippy/summary_snps_indels.csv spandx/Outputs/Phylogeny_and_annotation/f1_7___ merged_variants.csv
# #check if the number of the output file is correct?
# comm -12 <(cut -d, -f2 bacto_DNAseq/snippy/summary_snps_indels.csv | sort | uniq) <(cut -f2 spandx/Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated.txt | sort | uniq) | wc -l #26
# comm -12 <(cut -d, -f2 bacto_DNAseq/snippy/summary_snps_indels.csv | sort | uniq) <(cut -f2 spandx/Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated.txt | sort | uniq)
5. Structural Variant Detection
Structural variants are identified by assembly-to-reference comparison using:
- MUMmer / nucmer
- delta-filter
- Assemblytics
Example workflow:
Original commands
# install Environment
mamba create -n sv_assembly \
python=3.10 \
minimap2 \
mummer4 \
samtools \
bcftools \
syri \
assemblytics \
-c conda-forge -c bioconda
mamba activate sv_assembly
mamba install numpy pandas -c conda-forge
which paftools.js
minimap2 --version
nucmer --version
syri -h
Assemblytics
#mamba install mummerplot igv -c bioconda
# 4 Tools, only the 4th works
#Scenario Tool
#reads vs reference Sniffles / SVIM --> NOT WORKING
#assembly vs reference SyRI (best) --> NOT WORKING
#quick & simple minimap2 + paftools --> NOT WORKING
#classic SV detection Assemblytics --> WORKING (see below)
#--> Choose option 4: Assemblytics (classic SV tool for assemblies)
mamba activate sv_assembly
# ---------------- A6WT vs CP059040 (-->Nothing) ----------------
#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/A6WT/contigs.fa -p A6WT
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/A6WT_chr_plasmids.fasta -p A6WT
delta-filter -1 -q A6WT.delta > A6WT.filtered.delta
Assemblytics A6WT.filtered.delta A6WT_assemblytics 1000 100 50000
grep -w "Insertion" A6WT_assemblytics.Assemblytics_structural_variants.bed > A6WT_insertions.bed
wc -l A6WT_assemblytics.Assemblytics_structural_variants.bed
cut -f4 A6WT_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
cat A6WT_assemblytics.Assemblytics_assembly_stats.txt
show-coords -rcl A6WT.filtered.delta | head -20
Assemblytics A6WT.filtered.delta A6WT_assemblytics_v2 500 50 100000
CP059040 216043 226883 Assemblytics_b_1 10739 + Repeat_contraction 10840 101 CP059040_RagTag:192505-192606:+ between_alignments
CP059040 491041 496665 Assemblytics_b_2 5523 + Repeat_contraction 5624 101 CP059040_RagTag:456766-456867:+ between_alignments
CP059040 523243 528864 Assemblytics_b_3 5520 + Repeat_contraction 5621 101 CP059040_RagTag:483445-483546:+ between_alignments
CP059040 785771 794627 Assemblytics_b_4 8755 + Repeat_contraction 8856 101 CP059040_RagTag:740453-740554:+ between_alignments
CP059040 1327560 1328441 Assemblytics_b_5 780 + Repeat_contraction 881 101 CP059040_RagTag:1273488-1273589:+ between_alignments
CP059040 2190218 2191651 Assemblytics_b_7 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2135360-2135461:+ between_alignments
CP059040 2259736 2260384 Assemblytics_b_8 135 + Tandem_contraction -648 -783 CP059040_RagTag:2203411-2204194:- between_alignments
CP059040 2477722 2488562 Assemblytics_b_9 10739 + Repeat_contraction 10840 101 CP059040_RagTag:2421405-2421506:+ between_alignments
CP059040 2579725 2581158 Assemblytics_b_10 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2512670-2512771:+ between_alignments
CP059040 2810861 2863470 Assemblytics_b_11 52668 + Tandem_contraction 52609 -59 CP059040_RagTag:2742415-2742474:- between_alignments
CP059040 3089638 3090457 Assemblytics_b_12 718 + Repeat_contraction 819 101 CP059040_RagTag:2968584-2968685:+ between_alignments
CP059040 3124916 3125037 Assemblytics_b_13 198 + Tandem_contraction 121 -77 CP059040_RagTag:3003067-3003144:- between_alignments
CP059040 3282693 3288043 Assemblytics_b_14 5249 + Repeat_contraction 5350 101 CP059040_RagTag:3160723-3160824:+ between_alignments
CP059040 3642789 3643608 Assemblytics_b_15 718 + Repeat_contraction 819 101 CP059040_RagTag:3515569-3515670:+ between_alignments
CP059040 3732412 3737994 Assemblytics_b_16 5481 + Repeat_contraction 5582 101 CP059040_RagTag:3604474-3604575:+ between_alignments
CP059040 3881974 3882855 Assemblytics_b_17 780 + Repeat_contraction 881 101 CP059040_RagTag:3748555-3748656:+ between_alignments
CP059040 3942815 3948161 Assemblytics_b_18 5245 + Repeat_contraction 5346 101 CP059040_RagTag:3808616-3808717:+ between_alignments
# ---------------- A10CraA vs CP059040 (1157 nt deletion) ----------------
#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/A10CraA_clean.fasta -p A10CraA
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/A10CraA_chr_plasmids.fasta -p A10CraA
delta-filter -1 -q A10CraA.delta > A10CraA.filtered.delta
Assemblytics A10CraA.filtered.delta A10CraA_assemblytics 1000 100 50000
grep -w "Insertion" A10CraA_assemblytics.Assemblytics_structural_variants.bed > A10CraA_insertions.bed
wc -l A10CraA_assemblytics.Assemblytics_structural_variants.bed
cut -f4 A10CraA_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
cat A10CraA_assemblytics.Assemblytics_assembly_stats.txt
show-coords -rcl A10CraA.filtered.delta | head -20
Assemblytics A10CraA.filtered.delta A10CraA_assemblytics_v2 500 50 100000
#reference ref_start ref_stop ID size strand type ref_gap_size query_gap_size query_coordinates method
#CP059040 2810861 2863470 Assemblytics_b_2 52668 + Tandem_contraction 52609 -59 contig00003:229645-229704:- between_alignments
#CP059040 3124916 3125037 Assemblytics_b_3 198 + Tandem_contraction 121 -77 contig00009:157580-157657:- between_alignments
CP059040 364652 365809 Assemblytics_b_4 1154 + Deletion 1157 3 contig00011:137770-137773:+ between_alignments
#reference ref_start ref_stop ID size strand type ref_gap_size query_gap_size query_coordinates method
CP059040 216043 226883 Assemblytics_b_1 10739 + Repeat_contraction 10840 101 CP059040_RagTag:192505-192606:+ between_alignments
* CP059040 364652 365809 Assemblytics_b_2 1154 + Deletion 1157 3 CP059040_RagTag:330375-330378:+ between_alignments
CP059040 414041 414151 Assemblytics_b_3 211 + Tandem_expansion -110 101 CP059040_RagTag:378720-378821:+ between_alignments
CP059040 465315 465425 Assemblytics_b_4 211 + Tandem_expansion -110 101 CP059040_RagTag:430205-430306:+ between_alignments
CP059040 491041 496665 Assemblytics_b_5 5523 + Repeat_contraction 5624 101 CP059040_RagTag:456034-456135:+ between_alignments
CP059040 523243 528864 Assemblytics_b_6 5520 + Repeat_contraction 5621 101 CP059040_RagTag:482713-482814:+ between_alignments
CP059040 785771 794627 Assemblytics_b_7 8755 + Repeat_contraction 8856 101 CP059040_RagTag:739721-739822:+ between_alignments
CP059040 1327560 1328441 Assemblytics_b_8 780 + Repeat_contraction 881 101 CP059040_RagTag:1272756-1272857:+ between_alignments
CP059040 2190218 2191651 Assemblytics_b_10 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2134628-2134729:+ between_alignments
CP059040 2477722 2488562 Assemblytics_b_11 10739 + Repeat_contraction 10840 101 CP059040_RagTag:2420802-2420903:+ between_alignments
CP059040 2579725 2581158 Assemblytics_b_12 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2512067-2512168:+ between_alignments
CP059040 2810861 2863470 Assemblytics_b_13 52668 + Tandem_contraction 52609 -59 CP059040_RagTag:2741812-2741871:- between_alignments
CP059040 3089638 3090457 Assemblytics_b_14 718 + Repeat_contraction 819 101 CP059040_RagTag:2967981-2968082:+ between_alignments
CP059040 3124916 3125037 Assemblytics_b_15 198 + Tandem_contraction 121 -77 CP059040_RagTag:3002464-3002541:- between_alignments
CP059040 3282693 3288043 Assemblytics_b_16 5249 + Repeat_contraction 5350 101 CP059040_RagTag:3160120-3160221:+ between_alignments
CP059040 3642789 3643608 Assemblytics_b_17 718 + Repeat_contraction 819 101 CP059040_RagTag:3514966-3515067:+ between_alignments
CP059040 3732412 3737887 Assemblytics_b_18 5374 + Repeat_contraction 5475 101 CP059040_RagTag:3603871-3603972:+ between_alignments
CP059040 3881974 3882855 Assemblytics_b_19 780 + Repeat_contraction 881 101 CP059040_RagTag:3748059-3748160:+ between_alignments
CP059040 3942815 3948161 Assemblytics_b_20 5245 + Repeat_contraction 5346 101 CP059040_RagTag:3808120-3808221:+ between_alignments
gene 364609..365838
/locus_tag="craA"
CDS 364609..365838
/locus_tag="H0N29_01695"
/inference="COORDINATES: similar to AA
sequence:RefSeq:YP_004996877.1"
/note="Derived by automated computational analysis using
gene prediction method: Protein Homology."
/codon_start=1
/transl_table=11
/product="MFS transporter"
/protein_id="QNT85352.1"
/db_xref="GI:1906908782"
/translation="MKNIQTTALNRTTLMFPLALVLFEFAVYIGNDLIQPAMLAITEDFGVSATWAPSSMSFYLLGGASVAWLLGPLSDRLGRKKVLLSGVLFFALCCFLILLTRQIEHFLTLRFLQGIGLSVISAVGYAAIQENFAERDAIKVMALMANISLLAPLLGPVLGAFLIDYVSWHWGFVAIALLALLSWVGLKKQMPSHKVSVTKQPFSYLFDDFKKVFSNRQFLGLTLALPLVGMPLMLWIALSPIILVDELKLTSVQYGLAQFPVFLGLIVGNIVLIKIIDRLALGKTVLIGLPIMLTGTLILILGVVWQAYLIPCLLIGMTLICFGEGISFSVLYRFALMSSEVSKGTVAAAVSMLLMTSFFAMIELVRYLYTQFHLWAFVLSAFAFIALWFTQPRLALKREMQERVAQDLH"
chloramphenicol efflux MFS transporter CraA [Acinetobacter pittii]
Sequence ID: WP_016142916.1Length: 409Number of Matches: 1
# ---------------- 19606adeAB vs CP059040 (4282 nt deletion) ----------------
# Step 1: Align assemblies to reference
#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/19606adeAB/contigs.fa -p 19606adeAB
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/adeAB_chr_plasmids.fasta -p 19606adeAB
# Step 2: Filter alignments (1-to-1 best matches)
delta-filter -1 -q 19606adeAB.delta > 19606adeAB.filtered.delta
# Note: Use -1 for 1-to-1, not -r -q
# Step 3: Run Assemblytics with ALL 5 parameters
Assemblytics 19606adeAB.filtered.delta 19606adeAB_assemblytics 1000 100 50000
# Step 5: Extract large insertions only
grep -w "Insertion" 19606adeAB_assemblytics.Assemblytics_structural_variants.bed > 19606adeAB_insertions.bed
# 6. Check if ANY variants were detected (any size)
wc -l 19606adeAB_assemblytics.Assemblytics_structural_variants.bed
# 7. View variant type distribution
cut -f4 19606adeAB_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
# 8. Check alignment coverage (are contigs aligning well?)
cat 19606adeAB_assemblytics.Assemblytics_assembly_stats.txt
# 9. Check raw delta file for alignment blocks
show-coords -rcl 19606adeAB.filtered.delta | head -20
# 10. If bed file is empty, try relaxing parameters and re-run:
Assemblytics 19606adeAB.filtered.delta 19606adeAB_assemblytics_v2 500 50 100000
# └─unique─┘ └min┘ └──max──┘
CP059040 2810861 2863470 Assemblytics_b_1 52668 + Tandem_contraction 52609 -59 contig00005:229645-229704:- between_alignments
CP059040 3124916 3125037 Assemblytics_b_2 198 + Tandem_contraction 121 -77 contig00012:92096-92173:- between_alignments
3124917
--> DELETE!
CP059040 1844323 1848605 Assemblytics_b_5 4282 + Deletion 4282 0 contig00024:26693-26693:+ between_alignments
#reference ref_start ref_stop ID size strand type ref_gap_size query_gap_size query_coordinates method
CP059040 216043 226883 Assemblytics_b_1 10739 + Repeat_contraction 10840 101 CP059040_RagTag:192505-192606:+ between_alignments
CP059040 375124 375138 Assemblytics_b_2 115 + Insertion -14 101 CP059040_RagTag:340861-340962:+ between_alignments
CP059040 491041 496665 Assemblytics_b_5 5523 + Repeat_contraction 5624 101 CP059040_RagTag:456894-456995:+ between_alignments
CP059040 523243 528864 Assemblytics_b_6 5520 + Repeat_contraction 5621 101 CP059040_RagTag:483573-483674:+ between_alignments
CP059040 785771 794627 Assemblytics_b_8 8755 + Repeat_contraction 8856 101 CP059040_RagTag:740606-740707:+ between_alignments
CP059040 1327560 1328441 Assemblytics_b_9 780 + Repeat_contraction 881 101 CP059040_RagTag:1273641-1273742:+ between_alignments
CP059040 1607031 1607049 Assemblytics_b_12 83 + Insertion 18 101 CP059040_RagTag:1552326-1552427:+ between_alignments
* CP059040 1844323 1848605 Assemblytics_b_14 4282 + Deletion 4282 0 CP059040_RagTag:1789745-1789745:+ between_alignments
CP059040 1852044 1852303 Assemblytics_b_15 158 + Repeat_contraction 259 101 CP059040_RagTag:1793184-1793285:+ between_alignments
CP059040 2147239 2147252 Assemblytics_b_17 114 + Insertion -13 101 CP059040_RagTag:2088243-2088344:+ between_alignments
CP059040 2178043 2178056 Assemblytics_b_18 114 + Insertion -13 101 CP059040_RagTag:2119161-2119262:+ between_alignments
CP059040 2180008 2180010 Assemblytics_b_19 99 + Insertion 2 101 CP059040_RagTag:2121227-2121328:+ between_alignments
CP059040 2190218 2191651 Assemblytics_b_20 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2131536-2131637:+ between_alignments
CP059040 2488562 2488672 Assemblytics_b_22 211 + Tandem_expansion -110 101 CP059040_RagTag:2428616-2428717:+ between_alignments
CP059040 2579725 2581158 Assemblytics_b_23 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2519881-2519982:+ between_alignments
CP059040 2810861 2863470 Assemblytics_b_24 52668 + Tandem_contraction 52609 -59 CP059040_RagTag:2749626-2749685:- between_alignments
CP059040 2873400 2873644 Assemblytics_b_25 143 + Repeat_contraction 244 101 CP059040_RagTag:2759556-2759657:+ between_alignments
CP059040 3089638 3090457 Assemblytics_b_26 718 + Repeat_contraction 819 101 CP059040_RagTag:2975652-2975753:+ between_alignments
CP059040 3124916 3125037 Assemblytics_b_27 198 + Tandem_contraction 121 -77 CP059040_RagTag:3010135-3010212:- between_alignments
CP059040 3217209 3217578 Assemblytics_b_28 268 + Repeat_contraction 369 101 CP059040_RagTag:3102307-3102408:+ between_alignments
CP059040 3282677 3288043 Assemblytics_b_29 5265 + Repeat_contraction 5366 101 CP059040_RagTag:3167507-3167608:+ between_alignments
CP059040 3642789 3643608 Assemblytics_b_30 718 + Repeat_contraction 819 101 CP059040_RagTag:3522353-3522454:+ between_alignments
CP059040 3732428 3737887 Assemblytics_b_31 5358 + Repeat_contraction 5459 101 CP059040_RagTag:3611274-3611375:+ between_alignments
CP059040 3881974 3882855 Assemblytics_b_33 780 + Repeat_contraction 881 101 CP059040_RagTag:3755461-3755562:+ between_alignments
CP059040 3942815 3948161 Assemblytics_b_34 5245 + Repeat_contraction 5346 101 CP059040_RagTag:3815522-3815623:+ between_alignments
gene 1844319..1845509
/gene="adeA"
/locus_tag="H0N29_08675"
gene 1845506..1848616
/gene="adeB"
/locus_tag="H0N29_08680"
# Step 1: Align assemblies to reference
nucmer --maxmatch -l 100 -c 500 shovill/A6WT/contigs.fa shovill/19606adeAB/contigs.fa -p 19606adeAB_vs_A6WT
# Step 2: Filter alignments (1-to-1 best matches)
delta-filter -1 -q 19606adeAB_vs_A6WT.delta > 19606adeAB_vs_A6WT.filtered.delta
# Note: Use -1 for 1-to-1, not -r -q
# Step 3: Run Assemblytics with ALL 5 parameters
Assemblytics 19606adeAB_vs_A6WT.filtered.delta 19606adeAB_vs_A6WT_assemblytics 1000 100 50000
# Step 4: Extract large insertions only
grep -w "Insertion" 19606adeAB_vs_A6WT_assemblytics.Assemblytics_structural_variants.bed > 19606adeAB_vs_A6WT_insertions.bed
# Step 5. Check if ANY variants were detected (any size)
wc -l 19606adeAB_vs_A6WT_assemblytics.Assemblytics_structural_variants.bed
# Step 6. View variant type distribution
cut -f4 19606adeAB_vs_A6WT_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
# Step 7. Check alignment coverage (are contigs aligning well?)
cat 19606adeAB_vs_A6WT_assemblytics.Assemblytics_assembly_stats.txt
# Step 8. Check raw delta file for alignment blocks
show-coords -rcl 19606adeAB_vs_A6WT.filtered.delta | head -20
# Step 9. If bed file is empty, try relaxing parameters and re-run:
Assemblytics 19606adeAB_vs_A6WT.filtered.delta 19606adeAB_vs_A6WT_assemblytics_v2 500 50 100000
# └─unique─┘ └min┘ └──max──┘
# ---------------- adeIJ vs CP059040 (4436 nt deletion) ----------------
#nucmer --maxmatch -l 100 -c 500 CP059040.fasta shovill/adeIJ/contigs.fa -p adeIJ
nucmer --maxmatch -l 100 -c 500 CP059040.fasta ./shovill/adeIJ_chr_plasmids.fasta -p adeIJ
delta-filter -1 -q adeIJ.delta > adeIJ.filtered.delta
Assemblytics adeIJ.filtered.delta adeIJ_assemblytics 1000 100 50000
grep -w "Insertion" adeIJ_assemblytics.Assemblytics_structural_variants.bed > adeIJ_insertions.bed
wc -l adeIJ_assemblytics.Assemblytics_structural_variants.bed
cut -f4 adeIJ_assemblytics.Assemblytics_structural_variants.bed | sort | uniq -c
cat adeIJ_assemblytics.Assemblytics_assembly_stats.txt
show-coords -rcl adeIJ.filtered.delta | head -20
Assemblytics adeIJ.filtered.delta adeIJ_assemblytics_v2 500 50 100000
#CP059040 2810861 2863470 Assemblytics_b_2 52668 + Tandem_contraction 52609 -59 contig00003:229645-229704:- between_alignments
#CP059040 2259736 2260384 Assemblytics_b_3 135 + Tandem_contraction -648 -783 contig00005:217212-217995:- between_alignments
CP059040 737224 741667 Assemblytics_b_4 4436 + Deletion 4443 7 contig00007:208361-208368:+ between_alignments
#CP059040 3124916 3125037 Assemblytics_b_5 198 + Tandem_contraction 121 -77 contig00009:157580-157657:- between_alignments
CP059040 216043 226883 Assemblytics_b_1 10739 + Repeat_contraction 10840 101 CP059040_RagTag:192505-192606:+ between_alignments
CP059040 491041 496665 Assemblytics_b_2 5523 + Repeat_contraction 5624 101 CP059040_RagTag:456766-456867:+ between_alignments
CP059040 523243 528864 Assemblytics_b_3 5520 + Repeat_contraction 5621 101 CP059040_RagTag:483445-483546:+ between_alignments
* CP059040 737224 741667 Assemblytics_b_4 4436 + Deletion 4443 7 CP059040_RagTag:691906-691913:+ between_alignments
CP059040 785771 794627 Assemblytics_b_5 8755 + Repeat_contraction 8856 101 CP059040_RagTag:736017-736118:+ between_alignments
CP059040 1327560 1328441 Assemblytics_b_6 780 + Repeat_contraction 881 101 CP059040_RagTag:1269052-1269153:+ between_alignments
CP059040 2190218 2191651 Assemblytics_b_8 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2130924-2131025:+ between_alignments
CP059040 2259736 2260384 Assemblytics_b_9 135 + Tandem_contraction -648 -783 CP059040_RagTag:2198975-2199758:- between_alignments
CP059040 2477722 2488562 Assemblytics_b_10 10739 + Repeat_contraction 10840 101 CP059040_RagTag:2416969-2417070:+ between_alignments
CP059040 2579725 2581158 Assemblytics_b_11 1332 + Repeat_contraction 1433 101 CP059040_RagTag:2508234-2508335:+ between_alignments
CP059040 3089638 3090457 Assemblytics_b_13 718 + Repeat_contraction 819 101 CP059040_RagTag:2964148-2964249:+ between_alignments
CP059040 3124916 3125037 Assemblytics_b_14 198 + Tandem_contraction 121 -77 CP059040_RagTag:2998631-2998708:- between_alignments
CP059040 3282693 3288043 Assemblytics_b_15 5249 + Repeat_contraction 5350 101 CP059040_RagTag:3156287-3156388:+ between_alignments
CP059040 3642789 3643608 Assemblytics_b_16 718 + Repeat_contraction 819 101 CP059040_RagTag:3511133-3511234:+ between_alignments
CP059040 3732412 3737994 Assemblytics_b_17 5481 + Repeat_contraction 5582 101 CP059040_RagTag:3600038-3600139:+ between_alignments
CP059040 3881974 3882855 Assemblytics_b_18 780 + Repeat_contraction 881 101 CP059040_RagTag:3744119-3744220:+ between_alignments
CP059040 3942815 3948161 Assemblytics_b_19 5245 + Repeat_contraction 5346 101 CP059040_RagTag:3804180-3804281:+ between_alignments
CP059040 788949 794737 Assemblytics_b_27 5889 + Tandem_expansion -5788 101 CP059040_RagTag:3876490-3876591:+ between_alignments
gene complement(737233..740409)
/gene="adeJ"
/locus_tag="H0N29_03545"
gene complement(740422..741672)
/gene="adeI"
/locus_tag="H0N29_03550"
6. Confirmed Targeted Deletions
ΔadeAB
- 4,282-bp deletion
- Corresponds to CP059040 positions 1,844,323–1,848,605
-
Removes nearly the entire adeAB operon:
- adeA (H0N29_08675; 1,844,319–1,845,509)
- adeB (H0N29_08680; 1,845,506–1,848,616)
ΔadeIJ
- 4,443-bp deletion
- Corresponds to CP059040 positions 737,224–741,667
-
Removes the reverse-strand adeIJ operon:
- adeJ (H0N29_03545; complement 737,233–740,409)
- adeI (H0N29_03550; complement 740,422–741,672)
ΔcraA
- 1,157-bp deletion
- Corresponds to CP059040 positions 364,652–365,809
- Removes the central portion of craA (H0N29_01695; 364,609–365,838)
Together, the structural variant and SNP/indel analyses confirm that each mutant differs from WT only at the intended deletion locus.
7. Manuscript-Ready Summary
Whole-genome sequencing and comparative assembly analysis confirmed the genomic integrity of the ΔadeAB, ΔadeIJ, and ΔcraA mutants. Structural variant analysis identified a single targeted deletion in each strain, while consensus SNP/indel calling using Snippy and SPANDx detected no additional variants relative to WT outside the engineered loci.