-
prapare the input sequencing data
NGS.id Sample.name ONT_barcode jk3332 5179R1 Native Barcode NB01 jk3333 1585 Native Barcode NB02 jk3334 1585V Native Barcode NB03 jk3335 5179 Native Barcode NB04 jk3336 HD_05_2 Native Barcode NB05 jk3337 HD_05_2_K5 Native Barcode NB06 jk3338 HD_05_2_K6 Native Barcode NB07
-
assembly using trycycler
cat FAN41335_pass_barcode01_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode01_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode01.fastq.gz cat FAN41335_pass_barcode03_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode03_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode03.fastq.gz cat FAN41335_pass_barcode04_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_1.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_2.fastq.gz > FAN41335_pass_barcode04.fastq.gz cat FAN41335_pass_barcode05_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode05_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode05.fastq.gz unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode normal -t 55 -o 5179R1_normal unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode normal -t 55 -o 1585_normal #3 no short sequencing unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode normal -t 55 -o 5179_normal unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode normal -t 55 -o HD05_2_normal unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode normal -t 55 -o HD05_2_K5_normal unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode normal -t 55 -o HD05_2_K6_normal unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold #3 no short sequencing unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold #3 no short sequencing unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative #3 no short sequencing unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative ragtag.py scaffold ../assembly_flye_HD05_2/assembly.fasta assembly.fasta ragtag.py patch ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta grep -o 'N' ragtag.patch.fasta | wc -l makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
-
install the trycycler environment
nextdenovo_dir="/path/to/NextDenovo" nextpolish_dir="/path/to/NextPolish" genome_size="2500000" #2 503 927 /home/jhuang/Tools/canu/build/bin/canu -p canu -d canu_temp -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore read_subsets/sample_"$i".fastq /home/jhuang/Tools/Trycycler/scripts/canu_trim.py canu_temp/canu.contigs.fasta > assemblies/assembly_"$i".fasta /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh read_subsets/sample_"$i".fastq "$threads" > assemblies/assembly_"$i".gfa /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl config config.txt /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl bridge config.txt /home/jhuang/Tools/raven/build/bin/raven --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/assembly_"$i".gfa read_subsets/sample_"$i".fastq > assemblies/assembly_"$i".fasta #https://github.com/rrwick (Bandage, Unicycler, Filtlong, Trycycler, Polypolish install canu, flye, raven, miniasm, minipolish, any2fasta via 'mamba install' #install fastp, medaka, polypolish, masurca (install Polca) with 'mamba install' install NextDenovo and NextPolish from https://github.com/Nextomics wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz tar -vxzf NextDenovo.tgz && cd NextDenovo #cd NextDenovo && make wget https://github.com/Nextomics/NextPolish/releases/download/v1.4.1/NextPolish.tgz pip install paralleltask tar -vxzf NextPolish.tgz && cd NextPolish #&& make git clone https://github.com/rrwick/Minipolish.git $ wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz $ tar xzvf necat_20200803_Linux-amd64.tar.gz $ cd NECAT/Linux-amd64/bin $ export PATH=$PATH:$(pwd) # Install canu and raven under ~/Tools/ git clone https://github.com/marbl/canu.git cd canu/src make -j 50 #<number of threads> git clone https://github.com/lbcb-sci/raven && cd raven cmake -S ./ -B./build -DRAVEN_BUILD_EXE=1 -DCMAKE_BUILD_TYPE=Release cmake --build build # Adapt the script trycycler_assembly_extra-thorough.sh with the following complete paths. /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh
-
assembly using trycycler
TODO (IMPORTANT): assmeble all genomes using the following methods. compare them to the unicycler results. (trycycler) jhuang@hamm:~/DATA/Data_Holger_S.epidermidis_1585_5179_HD05$ ./trycycler_assembly_extra-thorough.sh #In the HD05 project, we use the following strategies! I. At first construct the genome only with Trycycler (Trycycler: a consensus long-read assembly tool), cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz trycycler_5179R1/reads.fastq.gz cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz trycycler_1585/reads.fastq.gz cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode03/FAN41335_pass_barcode03.fastq.gz trycycler_1585v/reads.fastq.gz cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz trycycler_5179/reads.fastq.gz cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz trycycler_HD05_2/reads.fastq.gz cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz trycycler_HD05_2_K5/reads.fastq.gz cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz trycycler_HD05_2_K6/reads.fastq.gz for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do cd ${sample}; ../trycycler_assembly_extra-thorough.sh; cd ..; done #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki) for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do cd ${sample}; ../trycycler_assembly_extra-thorough_raven.sh; cd ..; done #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki) for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do cd ${sample}; ../trycycler_assembly_extra-thorough_canu.sh; cd ..; done #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki) for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do cd ${sample}; trycycler cluster --threads 55 --assemblies assemblies/*.fasta --reads reads.fastq --out_dir trycycler; cd ..; done trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 #Error: failed to circularise sequence D_bctg00000000 because its start could not be found in other sequences. You can either trim some sequence off the start of D_bctg00000000 or exclude the sequence altogether and try again. #Error: failed to circularise sequence E_ctg000010 for multiple reasons. You must either repair this sequence or exclude it and then try running trycycler reconcile again. #Error: failed to circularise sequence W_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of W_ctg000000 or exclude the sequence altogether and try again. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 #Error: failed to circularise sequence K_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of K_ctg000000 or exclude the sequence altogether and try #Worst-1kbp: W_Utg714 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 #Error: failed to circularise sequence T_contig_1 because its end could not be found in other sequences. You can either trim some sequence off the end of T_contig_1 or exclude the sequence altogether and try again. # Worst-1kbp: D_bctg00000000, J_bctg00000000, P_bctg00000000 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 #Error: failed to circularise sequence A_tig00000003 because its start could not be found in other sequences. You can either trim some sequence off the start of A_tig00000003 or exclude the sequence altogether and try again. #Error: failed to circularise sequence E_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of E_ctg000000 or exclude the sequence altogether and try again. #Error: failed to circularise sequence Q_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of Q_ctg000000 or exclude the sequence altogether and try again. # Worst-1kbp: L_Utg716, X_Utg654 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_002 #M_tig00000002, S_tig00000003, A_tig00000003, C_utg000003l, G_tig00000002, I_utg000002l #E_ctg000000, K_ctg000000, Q_ctg000000 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_003 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_004 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_005 trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_006 # #--> When finished, Trycycler reconcile will make 2_all_seqs.fasta in the cluster directory, a multi-FASTA file containing each of the contigs ready for multiple sequence alignment. trycycler msa --threads 55 --cluster_dir trycycler/cluster_001 trycycler msa --threads 55 --cluster_dir trycycler/cluster_002 trycycler msa --threads 55 --cluster_dir trycycler/cluster_003 trycycler msa --threads 55 --cluster_dir trycycler/cluster_004 trycycler msa --threads 55 --cluster_dir trycycler/cluster_005 #--> When finished, Trycycler reconcile will make a 3_msa.fasta file in the cluster directory #generate 4_reads.fastq for each contig! trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_* #trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_001 trycycler/cluster_002 trycycler/cluster_003 trycycler consensus --threads 55 --cluster_dir trycycler/cluster_001 trycycler consensus --threads 55 --cluster_dir trycycler/cluster_002 trycycler consensus --threads 55 --cluster_dir trycycler/cluster_003 trycycler consensus --threads 55 --cluster_dir trycycler/cluster_004 trycycler consensus --threads 55 --cluster_dir trycycler/cluster_005 #!!NOTE that we take the isolates of HD05_2_K5 and HD05_2_K6 assembled by Unicycler instead of Trycycler!! # TODO (TODAY), generate the 3 datasets below! # TODO (IMPORTANT): write a Email to Holger, say the short sequencing of HD5_2 is not correct, since the 3 datasets! However, the MTxxxxxxx is confirmed not in K5 and K6! TODO: variant calling needs the short-sequencing, they are not dorable without the correct short-reads! resequencing? It is difficult to call variants only from long-reads since too much errors in long-reads! #TODO: check the MT sequence if in the isolates, more deteiled annotations come late! #II. Comparing the results of Trycycler with Unicycler. #III. Eventually add the plasmids assembled from unicycler to the final results. E.g. add the 4 plasmids to K5 and K6
-
Polishing after Trycycler
#1. Oxford Nanopore sequencer (Ignored due to the samtools version incompatibility!) # for c in trycycler/cluster_*; do # medaka_consensus -i "$c"/4_reads.fastq -d "$c"/7_final_consensus.fasta -o "$c"/medaka -m r941_min_sup_g507 -t 12 # mv "$c"/medaka/consensus.fasta "$c"/8_medaka.fasta # rm -r "$c"/medaka "$c"/*.fai "$c"/*.mmi # clean up # done # cat trycycler/cluster_*/8_medaka.fasta > trycycler/consensus.fasta #2. Short-read polishing #---- 5179_R1 (2) ---- # mean read depth: 205.8x # 188 bp have a depth of zero (99.9924% coverage) # 355 positions changed (0.0144% of total positions) # estimated pre-polishing sequence accuracy: 99.9856% (Q38.42) #Step 1: read QC fastp --in1 ../../s-epidermidis-5179-r1_R1.fastq.gz --in2 ../../s-epidermidis-5179-r1_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz #Step 2: Polypolish for cluster in cluster_001 cluster_002; do bwa index ${cluster}/7_final_consensus.fasta bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta done #Step 3: POLCA for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 37 #Insertion/Deletion Errors: 2 #Assembly Size: 2470001 #Consensus Quality: 99.9984 #Substitution Errors: 4 #Insertion/Deletion Errors: 0 #Assembly Size: 17748 #Consensus Quality: 99.9775 #Step 4: (optional) more rounds and/or other polishers #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes. for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G cd .. done Substitution Errors: 13 Insertion/Deletion Errors: 0 Assembly Size: 2470004 Consensus Quality: 99.9995 Substitution Errors: 0 Insertion/Deletion Errors: 0 Assembly Size: 17748 Consensus Quality: 100 for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2470004 #Consensus Quality: 100 #---- 1585 (4) ---- # mean read depth: 174.7x # 8,297 bp have a depth of zero (99.6604% coverage) # 271 positions changed (0.0111% of total positions) # estimated pre-polishing sequence accuracy: 99.9889% (Q39.55) #Step 1: read QC fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz #Step 2: Polypolish for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do bwa index ${cluster}/7_final_consensus.fasta bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta done #Step 3: POLCA for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do cd ${cluster} polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 7 #Insertion/Deletion Errors: 4 #Assembly Size: 2443174 #Consensus Quality: 99.9995 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 9014 #Consensus Quality: 100 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 9014 #Consensus Quality: 100 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2344 #Consensus Quality: 100 #Step 4: (optional) more rounds and/or other polishers #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes. for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2443176 #Consensus Quality: 100 #---- 1585 derived from unicycler, under 1585_normal/unicycler (4) ---- #Step 0: copy chrom and plasmid1, plasmid2, plasmid3 to cluster_001/7_final_consensus.fasta, ... #Step 1: read QC fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz #Step 2: Polypolish for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do bwa index ${cluster}/7_final_consensus.fasta bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta done #Polishing 1 (2,443,574 bp): #mean read depth: 174.7x #8,298 bp have a depth of zero (99.6604% coverage) #52 positions changed (0.0021% of total positions) #estimated pre-polishing sequence accuracy: 99.9979% (Q46.72) #Polishing 2 (9,014 bp): #mean read depth: 766.5x #3 bp have a depth of zero (99.9667% coverage) #0 positions changed (0.0000% of total positions) #estimated pre-polishing sequence accuracy: 100.0000% (Q∞) #Polishing 7 (2,344 bp): #mean read depth: 2893.0x #4 bp have a depth of zero (99.8294% coverage) #0 positions changed (0.0000% of total positions) #estimated pre-polishing sequence accuracy: 100.0000% (Q∞) #Polishing 8 (2,255 bp): #mean read depth: 2719.6x #4 bp have a depth of zero (99.8226% coverage) #0 positions changed (0.0000% of total positions) #estimated pre-polishing sequence accuracy: 100.0000% (Q∞) #Step 3: POLCA for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do cd ${cluster} polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 7 #Insertion/Deletion Errors: 4 #Assembly Size: 2443598 #Consensus Quality: 99.9995 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 9014 #Consensus Quality: 100 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2344 #Consensus Quality: 100 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2255 #Consensus Quality: 100 #Step 4: (optional) more rounds and/or other polishers #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes. for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2443600 #Consensus Quality: 100 #-- 1585v (1, no short reads, waiting) -- # TODO! #-- 5179 (2) -- #mean read depth: 120.7x #7,547 bp have a depth of zero (99.6946% coverage) #356 positions changed (0.0144% of total positions) #estimated pre-polishing sequence accuracy: 99.9856% (Q38.41) #Step 1: read QC fastp --in1 ../../s-epidermidis-5179_R1.fastq.gz --in2 ../../s-epidermidis-5179_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz #Step 2: Polypolish for cluster in cluster_001 cluster_002; do bwa index ${cluster}/7_final_consensus.fasta bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta done #Step 3: POLCA for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 49 #Insertion/Deletion Errors: 23 #Assembly Size: 2471418 #Consensus Quality: 99.9971 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 17748 #Consensus Quality: 100 #Step 4: (optional) more rounds POLCA for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 10 #Insertion/Deletion Errors: 5 #Assembly Size: 2471442 #Consensus Quality: 99.9994 for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G cd .. done Substitution Errors: 6 Insertion/Deletion Errors: 0 Assembly Size: 2471445 Consensus Quality: 99.9998 for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G cd .. done Substitution Errors: 0 Insertion/Deletion Errors: 0 Assembly Size: 2471445 Consensus Quality: 100 #-- HD5_2 (2): without the short-sequencing we cannot correct the base-calling! -- # !ERROR to be REPORTED, the #Polishing cluster_001_consensus (2,504,140 bp): #mean read depth: 94.4x #240,420 bp have a depth of zero (90.3991% coverage) #56,894 positions changed (2.2720% of total positions) #estimated pre-polishing sequence accuracy: 97.7280% (Q16.44) /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R2_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R1_001.fastq /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R2_001.fastq #Step 1: read QC fastp --in1 ../../HD5_2_S38_R1_001.fastq.gz --in2 ../../HD5_2_S38_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz # NOTE that the following steps are not run since the short-reads are not correct! # #Step 2: Polypolish # for cluster in cluster_001 cluster_005; do # bwa index ${cluster}/7_final_consensus.fasta # bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam # bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam # polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta # done # #Step 3: POLCA # for cluster in cluster_001 cluster_005; do # cd ${cluster} # polca.sh -a polypolish.fasta -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G # cd .. # done # #Step 4: (optional) more rounds POLCA # for cluster in cluster_001; do # cd ${cluster} # polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G # cd .. # done # NOTE that the plasmids of HD5_2_K5 and HD5_2_K6 were copied from Unicycler! #-- HD5_2_K5 (4) -- mean read depth: 87.1x 25 bp have a depth of zero (99.9990% coverage) 1,085 positions changed (0.0433% of total positions) estimated pre-polishing sequence accuracy: 99.9567% (Q33.63) #Step 1: read QC fastp --in1 ../../275_K5_Holger_S92_R1_001.fastq.gz --in2 ../../275_K5_Holger_S92_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz #Step 2: Polypolish for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do bwa index ${cluster}/7_final_consensus.fasta bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta done #Step 3: POLCA for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do cd ${cluster} polca.sh -a polypolish.fasta -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 146 #Insertion/Deletion Errors: 2 #Assembly Size: 2504401 #Consensus Quality: 99.9941 #Substitution Errors: 41 #Insertion/Deletion Errors: 0 #Assembly Size: 41288 #Consensus Quality: 99.9007 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 9191 #Consensus Quality: 100 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2767 #Consensus Quality: 100 #Step 4: (optional) more rounds POLCA for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 41 #Insertion/Deletion Errors: 0 #Assembly Size: 2504401 #Consensus Quality: 99.9984 #Substitution Errors: 8 #Insertion/Deletion Errors: 0 #Assembly Size: 41288 #Consensus Quality: 99.9806 for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 8 #Insertion/Deletion Errors: 0 #Assembly Size: 2504401 #Consensus Quality: 99.9997 #Substitution Errors: 4 #Insertion/Deletion Errors: 0 #Assembly Size: 41288 #Consensus Quality: 99.9903 for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 8 #Insertion/Deletion Errors: 0 #Assembly Size: 2504401 #Consensus Quality: 99.9997 #Substitution Errors: 4 #Insertion/Deletion Errors: 0 #Assembly Size: 41288 #Consensus Quality: 99.9903 #-- HD5_2_K6 (4) -- #mean read depth: 116.7x #4 bp have a depth of zero (99.9998% coverage) #1,022 positions changed (0.0408% of total positions) #estimated pre-polishing sequence accuracy: 99.9592% (Q33.89) #Step 1: read QC fastp --in1 ../../276_K6_Holger_S95_R1_001.fastq.gz --in2 ../../276_K6_Holger_S95_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz #Step 2: Polypolish for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do bwa index ${cluster}/7_final_consensus.fasta bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta done #Step 3: POLCA for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do cd ${cluster} polca.sh -a polypolish.fasta -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 164 #Insertion/Deletion Errors: 2 #Assembly Size: 2504398 #Consensus Quality: 99.9934 #Substitution Errors: 22 #Insertion/Deletion Errors: 0 #Assembly Size: 41288 #Consensus Quality: 99.9467 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 9191 #Consensus Quality: 100 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 2767 #Consensus Quality: 100 #Step 4: (optional) more rounds POLCA for cluster in cluster_001 cluster_002; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 32 #Insertion/Deletion Errors: 0 #Assembly Size: 2504400 #Consensus Quality: 99.9987 #Substitution Errors: 0 #Insertion/Deletion Errors: 0 #Assembly Size: 41288 #Consensus Quality: 100 for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 4 #Insertion/Deletion Errors: 0 #Assembly Size: 2504400 #Consensus Quality: 99.9998 for cluster in cluster_001; do cd ${cluster} polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G cd .. done #Substitution Errors: 2 #Insertion/Deletion Errors: 0 #Assembly Size: 2504400 #Consensus Quality: 99.9999
-
Results by directly using Unicycler
#----------------------- 5179R1_normal ----------------------- >1 length=2468563 depth=1.00x circular=true >2 length=17748 depth=1.42x circular=true Component Segments Links Length N50 Longest segment Status total 2 2 2,486,311 2,468,563 2,468,563 1 1 1 2,468,563 2,468,563 2,468,563 complete 2 1 1 17,748 17,748 17,748 complete Segment Length Depth Starting gene Position Strand Identity Coverage 1 2,468,563 1.00x UniRef90_Q5HJZ9 1,212,460 forward 100.0% 100.0% 2 17,748 1.42x UniRef90_A0A0H2VIR3 4,804 reverse 93.2% 99.7% # ---- 5179_bold ---- Segment Length Depth Starting gene Position Strand Identity Coverage 1 2,469,173 1.00x UniRef90_Q5HJZ9 1,901,872 reverse 100.0% 100.0% 2 17,749 2.27x UniRef90_A0A0H2VIR3 4,771 forward 93.2% 99.7% 4 4,595 10.19x none found 8 2,449 17.14x none found >1 length=2469173 depth=1.00x circular=true >2 length=17749 depth=2.27x circular=true >3 length=4761 depth=0.44x >4 length=4595 depth=10.19x circular=true >5 length=3735 depth=0.29x >6 length=3718 depth=0.42x >7 length=3573 depth=0.52x >8 length=2449 depth=17.14x circular=true >9 length=2411 depth=0.35x >10 length=2371 depth=0.32x >11 length=2365 depth=0.43x >12 length=1637 depth=0.44x >13 length=1568 depth=0.66x >14 length=1505 depth=0.65x >15 length=1403 depth=0.93x >16 length=1329 depth=0.55x makeblastdb -in assembly.fasta -dbtype nucl blastn -task blastn-short -db ../HD05_2_K5_conservative/assembly.fasta -query assembly.fasta -out 2-16_vs_1.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1 #TODO: manually fill the gap in the HD05_2 genome! 5 1 99.946 3728 1 1 1 3728 1535666 1539392 0.0 7366 6 1 99.973 3718 0 1 1 3718 702963 706679 0.0 7355 7 1 99.888 3573 1 3 1 3573 1764622 1768191 0.0 7027 9 1 100.000 2411 0 0 1 2411 1060914 1063324 0.0 4779 10 1 100.000 2371 0 0 1 2371 615275 612905 0.0 4700 11 1 99.958 2365 0 1 1 2365 1088713 1086350 0.0 4672 12 1 100.000 1637 0 0 1 1637 146635 144999 0.0 3245 13 1 99.936 1568 0 1 1 1568 2024197 2025763 0.0 3092 14 1 100.000 1505 0 0 1 1505 2445480 2443976 0.0 2983 15 1 100.000 1403 0 0 1 1403 197723 196321 0.0 2781 16 1 99.925 1329 1 0 1 1329 49854 48526 0.0 2627 # -------------------- 1585_normal -------------------- >1 length=2443574 depth=1.00x circular=true #contig_1 2442282 10 60 61 >2 length=9014 depth=3.72x circular=true >3 length=4388 depth=0.89x >4 length=3443 depth=0.48x >5 length=3338 depth=0.48x >6 length=3336 depth=0.45x >7 length=2344 depth=11.44x circular=true >8 length=2255 depth=9.81x circular=true >9 length=1929 depth=0.37x >10 length=1703 depth=1.67x >11 length=1605 depth=0.26x >12 length=1381 depth=0.56x >13 length=1360 depth=0.39x >14 length=1281 depth=0.41x >15 length=1163 depth=0.51x >16 length=1088 depth=0.24x 2594107 ragtag.py scaffold ../HD05_2_K5_normal/assembly.fasta assembly.fasta ragtag.py patch ragtag.scaffold.fasta ../../HD05_2_K5_normal/assembly.fasta grep -o 'N' ragtag.patch.fasta | wc -l 3 1 99.977 4388 0 1 1 4388 2410738 2406352 0.0 8683 4 1 99.942 3443 0 2 1 3443 2222741 2219301 0.0 6794 5 1 99.970 3338 0 1 1 3338 455636 452300 0.0 6601 6 1 99.940 3336 0 2 1 3336 1617740 1614407 0.0 6581 9 1 99.948 1929 0 1 1 1929 1321522 1319595 0.0 3808 10 1 99.941 1703 1 0 1 1703 90503 88801 0.0 3368 11 1 99.938 1605 0 1 1 1605 2361795 2363398 0.0 3166 12 1 99.928 1381 0 1 1 1381 241092 242471 0.0 2722 13 1 100.000 1360 0 0 1 1360 1157897 1159256 0.0 2696 14 1 100.000 1281 0 0 1 1281 218323 219603 0.0 2539 15 1 100.000 1163 0 0 1 1163 2077536 2078698 0.0 2305 16 1 100.000 1088 0 0 1 1088 283284 284371 0.0 2157 >1 length=2503585 depth=1.00x circular=true >2 length=41288 depth=3.32x circular=true >3 length=9191 depth=8.29x circular=true >4 length=2767 depth=9.36x circular=true >1 length=2503927 depth=1.00x circular=true >2 length=41288 depth=3.77x circular=true >3 length=9191 depth=7.83x circular=true >4 length=2767 depth=10.11x circular=true #-------------------------- 1585V #[2024-01-17 13:42:28] INFO: Assembly statistics: Total length: 2438882 vs 2443574 Fragments: 1 Fragments N50: 2438882 Largest frg: 2438882 Scaffolds: 0 Mean coverage: 47 unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative #3 no short sequencing unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative # ---- 1 5179R1 2469692 ---- >1 length=2468563 depth=1.00x circular=true >2 length=17748 depth=1.42x circular=true # ---- 2 1585 2442282 ---- (compring to Trycyler chrom is 2443176 nt) >1 length=2443574 depth=1.00x circular=true >2 length=9014 depth=3.72x circular=true >7 length=2344 depth=11.44x circular=true >8 length=2255 depth=9.81x circular=true # ---- 3 1585v 2438882 ---- #using long sequencing only 1 # ---- 4 5179_bold 2471107+17740 ---- >1 length=2469173 depth=1.00x circular=true >2 length=17749 depth=2.27x circular=true >4 length=4595 depth=10.19x circular=true >8 length=2449 depth=17.14x circular=true # ---- 5 HD05_2 2504622 ---- # Note in HD05_2_bold_hq_lq including the bad long-reads. >1 length=965875 depth=0.95x >2 length=855325 depth=1.00x >3 length=582944 depth=1.02x >4 length=183656 depth=1.02x >5 length=13570 depth=4.73x circular=true >6 length=1503 depth=4.85x >7 length=1271 depth=5.06x >8 length=227 depth=2.03x >9 length=153 depth=0.93x >10 length=152 depth=1.09x # trycycler: 2503231 (yes), 9183 (yes), 22394, 18541 -- # ---- 6 HD05_2_K5 2504656+41290+9191 ---- conservative >1 length=2503585 depth=1.00x circular=true >2 length=41288 depth=3.32x circular=true >3 length=9191 depth=8.29x circular=true >4 length=2767 depth=9.36x circular=true # ---- 7 HD05_2_K6 2504588+41285+9192 ---- conservative >1 length=2503927 depth=1.00x circular=true >2 length=41288 depth=3.77x circular=true >3 length=9191 depth=7.83x circular=true >4 length=2767 depth=10.11x circular=true ragtag.py scaffold ../assembly_flye_HD05_2/assembly.fasta assembly.fasta ragtag.py patch ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta grep -o 'N' ragtag.patch.fasta | wc -l makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.000001 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1
-
Submit all genomes to NCBI
TODO: If 1585V using only the long reads to assemble the genome! BioSample accession BioProject: PRJNA1038700 Staphylococcus epidermidis strain:1585v | isolate:1585v Genome sequencing SAMN38198576 Pathogen: clinical or host-associated sample from Staphylococcus epidermidis 0 SRAs Status To be released Release date 2027-11-10 Created 2023-11-10 15:24 Updated 2023-11-17 16:57 Sample name 1585v Package Pathogen: clinical or host-associated; version 1.0 Organism Name: Staphylococcus epidermidis Taxonomy ID: 1282 Attributes Attribute name Attribute value collected by H R collection date 2004 geographic location Germany: Hamburg host Homo sapiens host disease port-catheter infection isolation source port-catheter isolate missing strain 1585v latitude and longitude 53.551672 N 9.955081 E https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#run_pgap
-
Background of 1585v and 1585
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346721/ S. epidermidis 1585 is known to be biofilm-negative in laboratory media, but to form biofilm in the presence of human serum.
In contrast, S. epidermidis 1585 v is a variant derived from strain 1585 in which, due to a chromosomal re-arrangement, a 460 kDa isoform of Embp is overexpressed even in TSB, while mutant M135 is an isogenic mutant of 1585 v in which expression of Embp is interrupted by insertion of transposon Tn917.
Staphylococcus epidermidis (S. epidermidis) 1585 is a specific strain of S. epidermidis that is classified as a wild-type strain. Wild-type in bacterial terminology refers to the strain of an organism that is found in nature, as opposed to those that have been modified or mutated in a laboratory setting. Here are some key points about the 1585 wild-type strain of S. epidermidis:
-
No Embp Production in TSB: One notable characteristic of the S. epidermidis 1585 strain is that it does not produce Embp (extracellular matrix binding protein) when grown in TSB (Tryptic Soy Broth). Embp is a protein that plays a crucial role in the biofilm formation and adherence of bacteria to surfaces. The absence of Embp production in this strain could impact its ability to form biofilms, a common virulence factor in Staphylococcus infections.
-
Biofilm Formation: S. epidermidis is known for its ability to form biofilms, especially on medical devices, leading to infections that are difficult to treat. The fact that the 1585 strain doesn’t produce Embp in TSB suggests it may have a reduced capacity for biofilm formation under these conditions, which could be significant in understanding and managing such infections.
-
Research and Clinical Implications: Studying wild-type strains like S. epidermidis 1585 is important for understanding the natural behavior and characteristics of the species. Since this strain behaves differently from other strains in terms of Embp production and possibly biofilm formation, it can provide insights into the mechanisms and genetic factors that control these processes. This knowledge is valuable for developing strategies to prevent and treat infections, especially in hospital and healthcare settings where S. epidermidis infections are common.
-
Genetic Studies: The 1585 strain can also serve as a baseline or control in genetic studies. By comparing the genome and behavior of 1585 with other strains of S. epidermidis, researchers can identify genetic variations and mutations that may be responsible for different phenotypes, such as increased virulence or antibiotic resistance.
Model Organism for Understanding Staphylococcal Behavior: As a wild-type strain, 1585 offers a model for studying the natural state of S. epidermidis. This is crucial for understanding the fundamental biology of the bacterium, which can help in the development of treatments and interventions against infections caused by more virulent or drug-resistant strains. In summary, the S. epidermidis
1585 wild-type strain is significant in microbiological research due to its natural characteristics, particularly its behavior in biofilm formation and Embp production. Understanding these aspects can contribute to better insights into the pathogenicity and treatment of Staphylococcus infections, particularly in clinical settings where these bacteria are a common source of nosocomial infections.