Unicycler vs. Trycycler

  1. prapare the input sequencing data

     NGS.id  Sample.name  ONT_barcode
     jk3332  5179R1  Native Barcode NB01
     jk3333  1585  Native Barcode NB02
     jk3334  1585V  Native Barcode NB03
     jk3335  5179  Native Barcode NB04
     jk3336  HD_05_2  Native Barcode NB05
     jk3337  HD_05_2_K5  Native Barcode NB06
     jk3338  HD_05_2_K6  Native Barcode NB07
  2. assembly using trycycler

     cat FAN41335_pass_barcode01_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode01_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode01.fastq.gz
     cat FAN41335_pass_barcode03_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode03_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode03.fastq.gz
     cat FAN41335_pass_barcode04_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_1.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_2.fastq.gz > FAN41335_pass_barcode04.fastq.gz
     cat FAN41335_pass_barcode05_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode05_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode05.fastq.gz
    
     unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode normal -t 55 -o 5179R1_normal
     unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode normal -t 55 -o 1585_normal
     #3 no short sequencing
     unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode normal -t 55 -o 5179_normal
     unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode normal -t 55 -o HD05_2_normal
     unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode normal -t 55 -o HD05_2_K5_normal
     unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode normal -t 55 -o HD05_2_K6_normal
    
     unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
     unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
     #3 no short sequencing
     unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
     unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
     unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
     unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
    
     unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
     unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
     #3 no short sequencing
     unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
     unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
     unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
     unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
    
     unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
     unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
     #3 no short sequencing
     unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
    
     unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
     unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
     unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
    
     ragtag.py scaffold  ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
     ragtag.py patch  ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
     grep -o 'N' ragtag.patch.fasta | wc -l
    
     makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
     blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1 
  3. install the trycycler environment

     nextdenovo_dir="/path/to/NextDenovo"
     nextpolish_dir="/path/to/NextPolish"
     genome_size="2500000" #2 503 927
     /home/jhuang/Tools/canu/build/bin/canu    -p canu -d canu_temp -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore read_subsets/sample_"$i".fastq
     /home/jhuang/Tools/Trycycler/scripts/canu_trim.py    canu_temp/canu.contigs.fasta > assemblies/assembly_"$i".fasta
     /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh    read_subsets/sample_"$i".fastq "$threads" > assemblies/assembly_"$i".gfa
     /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl    config config.txt
     /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl    bridge config.txt
     /home/jhuang/Tools/raven/build/bin/raven    --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/assembly_"$i".gfa read_subsets/sample_"$i".fastq > assemblies/assembly_"$i".fasta
    
     #https://github.com/rrwick (Bandage, Unicycler, Filtlong, Trycycler, Polypolish
     install canu, flye, raven, miniasm, minipolish, any2fasta via 'mamba install'
     #install fastp, medaka, polypolish, masurca (install Polca) with 'mamba install'
    
     install NextDenovo and NextPolish from https://github.com/Nextomics
     wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
     tar -vxzf NextDenovo.tgz && cd NextDenovo
     #cd NextDenovo && make
     wget https://github.com/Nextomics/NextPolish/releases/download/v1.4.1/NextPolish.tgz
     pip install paralleltask
     tar -vxzf NextPolish.tgz && cd NextPolish   #&& make
    
     git clone https://github.com/rrwick/Minipolish.git
    
     $ wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
     $ tar xzvf necat_20200803_Linux-amd64.tar.gz
     $ cd NECAT/Linux-amd64/bin
     $ export PATH=$PATH:$(pwd)
    
     # Install canu and raven under ~/Tools/
     git clone https://github.com/marbl/canu.git
     cd canu/src
     make -j 50  #<number of threads>
    
     git clone https://github.com/lbcb-sci/raven && cd raven
     cmake -S ./ -B./build -DRAVEN_BUILD_EXE=1 -DCMAKE_BUILD_TYPE=Release
     cmake --build build
    
     # Adapt the script trycycler_assembly_extra-thorough.sh with the following complete paths.
     /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl
     /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh
  4. assembly using trycycler

     TODO (IMPORTANT): assmeble all genomes using the following methods. compare them to the unicycler results.
    
     (trycycler) jhuang@hamm:~/DATA/Data_Holger_S.epidermidis_1585_5179_HD05$ ./trycycler_assembly_extra-thorough.sh 
    
     #In the HD05 project, we use the following strategies!
    
     I. At first construct the genome only with Trycycler (Trycycler: a consensus long-read assembly tool), 
    
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz trycycler_5179R1/reads.fastq.gz
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz trycycler_1585/reads.fastq.gz
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode03/FAN41335_pass_barcode03.fastq.gz trycycler_1585v/reads.fastq.gz
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz trycycler_5179/reads.fastq.gz
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz trycycler_HD05_2/reads.fastq.gz
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz trycycler_HD05_2_K5/reads.fastq.gz
     cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz trycycler_HD05_2_K6/reads.fastq.gz
    
     for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
     cd ${sample};
     ../trycycler_assembly_extra-thorough.sh;
     cd ..;
     done
     #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    
     for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
     cd ${sample};
     ../trycycler_assembly_extra-thorough_raven.sh;
     cd ..;
     done
     #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    
     for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
     cd ${sample};
     ../trycycler_assembly_extra-thorough_canu.sh;
     cd ..;
     done
     #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    
     for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
     cd ${sample};
     trycycler cluster --threads 55 --assemblies assemblies/*.fasta --reads reads.fastq --out_dir trycycler;
     cd ..;
     done
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
     #Error: failed to circularise sequence D_bctg00000000 because its start could not be found in other sequences. You can either trim some sequence off the start of D_bctg00000000 or exclude the sequence altogether
     and try again.
     #Error: failed to circularise sequence E_ctg000010 for multiple reasons. You must either repair this sequence or exclude it and then try running trycycler reconcile again.
     #Error: failed to circularise sequence W_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of W_ctg000000 or exclude the sequence altogether and try again.
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
     #Error: failed to circularise sequence K_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of K_ctg000000 or exclude the sequence altogether and try
     #Worst-1kbp: W_Utg714
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
     #Error: failed to circularise sequence T_contig_1 because its end could not be found in other sequences. You can either trim some sequence off the end of T_contig_1 or exclude the sequence altogether and try again.
     # Worst-1kbp: D_bctg00000000, J_bctg00000000, P_bctg00000000
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
     #Error: failed to circularise sequence A_tig00000003 because its start could not be found in other sequences. You can either trim some sequence off the start of A_tig00000003 or exclude the sequence altogether and try again.
     #Error: failed to circularise sequence E_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of E_ctg000000 or exclude the sequence altogether and try again.
     #Error: failed to circularise sequence Q_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of Q_ctg000000 or exclude the sequence altogether and try again.
     # Worst-1kbp: L_Utg716, X_Utg654
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_002
     #M_tig00000002, S_tig00000003, A_tig00000003, C_utg000003l, G_tig00000002, I_utg000002l
     #E_ctg000000, K_ctg000000, Q_ctg000000
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_003
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_004
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_005
     trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_006
     #
     #--> When finished, Trycycler reconcile will make 2_all_seqs.fasta in the cluster directory, a multi-FASTA file containing each of the contigs ready for multiple sequence alignment.
    
     trycycler msa --threads 55 --cluster_dir trycycler/cluster_001
     trycycler msa --threads 55 --cluster_dir trycycler/cluster_002
     trycycler msa --threads 55 --cluster_dir trycycler/cluster_003
     trycycler msa --threads 55 --cluster_dir trycycler/cluster_004
     trycycler msa --threads 55 --cluster_dir trycycler/cluster_005
     #--> When finished, Trycycler reconcile will make a 3_msa.fasta file in the cluster directory
    
     #generate 4_reads.fastq for each contig!
     trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_*
     #trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_001 trycycler/cluster_002 trycycler/cluster_003
    
     trycycler consensus --threads 55 --cluster_dir trycycler/cluster_001
     trycycler consensus --threads 55 --cluster_dir trycycler/cluster_002
     trycycler consensus --threads 55 --cluster_dir trycycler/cluster_003
     trycycler consensus --threads 55 --cluster_dir trycycler/cluster_004
     trycycler consensus --threads 55 --cluster_dir trycycler/cluster_005
    
     #!!NOTE that we take the isolates of HD05_2_K5 and HD05_2_K6 assembled by Unicycler instead of Trycycler!!
    
     # TODO (TODAY), generate the 3 datasets below!
     # TODO (IMPORTANT): write a Email to Holger, say the short sequencing of HD5_2 is not correct, since the 3 datasets! However, the MTxxxxxxx is confirmed not in K5 and K6!
     TODO: variant calling needs the short-sequencing, they are not dorable without the correct short-reads! resequencing? It is difficult to call variants only from long-reads since too much errors in long-reads!
     #TODO: check the MT sequence if in the isolates, more deteiled annotations come late!
     #II. Comparing the results of Trycycler with Unicycler.
     #III. Eventually add the plasmids assembled from unicycler to the final results. E.g. add the 4 plasmids to K5 and K6
  5. Polishing after Trycycler

     #1. Oxford Nanopore sequencer (Ignored due to the samtools version incompatibility!)
     # for c in trycycler/cluster_*; do
     #     medaka_consensus -i "$c"/4_reads.fastq -d "$c"/7_final_consensus.fasta -o "$c"/medaka -m r941_min_sup_g507 -t 12
     #     mv "$c"/medaka/consensus.fasta "$c"/8_medaka.fasta
     #     rm -r "$c"/medaka "$c"/*.fai "$c"/*.mmi  # clean up
     # done
     # cat trycycler/cluster_*/8_medaka.fasta > trycycler/consensus.fasta
    
     #2. Short-read polishing
    
     #---- 5179_R1 (2) ----
     #  mean read depth: 205.8x
     #  188 bp have a depth of zero (99.9924% coverage)
     #  355 positions changed (0.0144% of total positions)
     #  estimated pre-polishing sequence accuracy: 99.9856% (Q38.42)
    
     #Step 1: read QC
     fastp --in1 ../../s-epidermidis-5179-r1_R1.fastq.gz --in2 ../../s-epidermidis-5179-r1_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     #Step 2: Polypolish
     for cluster in cluster_001 cluster_002; do
     bwa index ${cluster}/7_final_consensus.fasta
     bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
     bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
     polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     done
    
     #Step 3: POLCA
     for cluster in cluster_001 cluster_002; do
     cd ${cluster}
     polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
    
     #Substitution Errors: 37
     #Insertion/Deletion Errors: 2
     #Assembly Size: 2470001
     #Consensus Quality: 99.9984
    
     #Substitution Errors: 4
     #Insertion/Deletion Errors: 0
     #Assembly Size: 17748
     #Consensus Quality: 99.9775
    
     #Step 4: (optional) more rounds and/or other polishers
     #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! 
     #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    
     for cluster in cluster_001 cluster_002; do
     cd ${cluster}
     polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
    
     Substitution Errors: 13
     Insertion/Deletion Errors: 0
     Assembly Size: 2470004
     Consensus Quality: 99.9995
    
     Substitution Errors: 0
     Insertion/Deletion Errors: 0
     Assembly Size: 17748
     Consensus Quality: 100
    
     for cluster in cluster_001; do
     cd ${cluster}
     polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2470004
     #Consensus Quality: 100
    
     #---- 1585 (4) ----
     #  mean read depth: 174.7x
     #  8,297 bp have a depth of zero (99.6604% coverage)
     #  271 positions changed (0.0111% of total positions)
     #  estimated pre-polishing sequence accuracy: 99.9889% (Q39.55)
    
     #Step 1: read QC
     fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     #Step 2: Polypolish
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       bwa index ${cluster}/7_final_consensus.fasta
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
       polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     done
    
     #Step 3: POLCA
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       cd ${cluster}
       polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
       cd ..
     done
    
     #Substitution Errors: 7
     #Insertion/Deletion Errors: 4
     #Assembly Size: 2443174
     #Consensus Quality: 99.9995
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 9014
     #Consensus Quality: 100
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 9014
     #Consensus Quality: 100
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2344
     #Consensus Quality: 100
    
     #Step 4: (optional) more rounds and/or other polishers
     #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! 
     #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    
     for cluster in cluster_001; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
       cd ..
     done
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2443176
     #Consensus Quality: 100
    
     #---- 1585 derived from unicycler, under 1585_normal/unicycler (4) ----
     #Step 0: copy chrom and plasmid1, plasmid2, plasmid3 to cluster_001/7_final_consensus.fasta, ...
    
     #Step 1: read QC
     fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     #Step 2: Polypolish
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       bwa index ${cluster}/7_final_consensus.fasta
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
       polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     done
     #Polishing 1 (2,443,574 bp):
     #mean read depth: 174.7x
     #8,298 bp have a depth of zero (99.6604% coverage)
     #52 positions changed (0.0021% of total positions)
     #estimated pre-polishing sequence accuracy: 99.9979% (Q46.72)
     #Polishing 2 (9,014 bp):
     #mean read depth: 766.5x
     #3 bp have a depth of zero (99.9667% coverage)
     #0 positions changed (0.0000% of total positions)
     #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
     #Polishing 7 (2,344 bp):
     #mean read depth: 2893.0x
     #4 bp have a depth of zero (99.8294% coverage)
     #0 positions changed (0.0000% of total positions)
     #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
     #Polishing 8 (2,255 bp):
     #mean read depth: 2719.6x
     #4 bp have a depth of zero (99.8226% coverage)
     #0 positions changed (0.0000% of total positions)
     #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    
     #Step 3: POLCA
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       cd ${cluster}
       polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
       cd ..
     done
    
     #Substitution Errors: 7
     #Insertion/Deletion Errors: 4
     #Assembly Size: 2443598
     #Consensus Quality: 99.9995
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 9014
     #Consensus Quality: 100
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2344
     #Consensus Quality: 100
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2255
     #Consensus Quality: 100
    
     #Step 4: (optional) more rounds and/or other polishers
     #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! 
     #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    
     for cluster in cluster_001; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
       cd ..
     done
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2443600
     #Consensus Quality: 100
    
     #-- 1585v (1, no short reads, waiting) --
     # TODO!
    
     #-- 5179 (2) --
     #mean read depth: 120.7x
     #7,547 bp have a depth of zero (99.6946% coverage)
     #356 positions changed (0.0144% of total positions)
     #estimated pre-polishing sequence accuracy: 99.9856% (Q38.41)
    
     #Step 1: read QC
     fastp --in1 ../../s-epidermidis-5179_R1.fastq.gz --in2 ../../s-epidermidis-5179_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     #Step 2: Polypolish
     for cluster in cluster_001 cluster_002; do
     bwa index ${cluster}/7_final_consensus.fasta
     bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
     bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
     polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     done
    
     #Step 3: POLCA
     for cluster in cluster_001 cluster_002; do
     cd ${cluster}
     polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
    
     #Substitution Errors: 49
     #Insertion/Deletion Errors: 23
     #Assembly Size: 2471418
     #Consensus Quality: 99.9971
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 17748
     #Consensus Quality: 100
    
     #Step 4: (optional) more rounds POLCA
     for cluster in cluster_001; do
     cd ${cluster}
     polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
     #Substitution Errors: 10
     #Insertion/Deletion Errors: 5
     #Assembly Size: 2471442
     #Consensus Quality: 99.9994
    
     for cluster in cluster_001; do
     cd ${cluster}
     polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
     Substitution Errors: 6
     Insertion/Deletion Errors: 0
     Assembly Size: 2471445
     Consensus Quality: 99.9998
    
     for cluster in cluster_001; do
     cd ${cluster}
     polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
     cd ..
     done
     Substitution Errors: 0
     Insertion/Deletion Errors: 0
     Assembly Size: 2471445
     Consensus Quality: 100
    
     #-- HD5_2 (2): without the short-sequencing we cannot correct the base-calling! --
     # !ERROR to be REPORTED, the 
     #Polishing cluster_001_consensus (2,504,140 bp):
     #mean read depth: 94.4x
     #240,420 bp have a depth of zero (90.3991% coverage)
     #56,894 positions changed (2.2720% of total positions)
     #estimated pre-polishing sequence accuracy: 97.7280% (Q16.44)
    
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R2_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R1_001.fastq
     /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R2_001.fastq
     #Step 1: read QC
     fastp --in1 ../../HD5_2_S38_R1_001.fastq.gz --in2 ../../HD5_2_S38_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     # NOTE that the following steps are not run since the short-reads are not correct!
     # #Step 2: Polypolish
     # for cluster in cluster_001 cluster_005; do
     #   bwa index ${cluster}/7_final_consensus.fasta
     #   bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
     #   bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
     #   polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     # done
    
     # #Step 3: POLCA
     # for cluster in cluster_001 cluster_005; do
     #   cd ${cluster}
     #   polca.sh -a polypolish.fasta -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
     #   cd ..
     # done
    
     # #Step 4: (optional) more rounds POLCA
     # for cluster in cluster_001; do
     #   cd ${cluster}
     #   polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
     #   cd ..
     # done
    
     # NOTE that the plasmids of HD5_2_K5 and HD5_2_K6 were copied from Unicycler!
     #-- HD5_2_K5 (4) --
     mean read depth: 87.1x
     25 bp have a depth of zero (99.9990% coverage)
     1,085 positions changed (0.0433% of total positions)
     estimated pre-polishing sequence accuracy: 99.9567% (Q33.63)
    
     #Step 1: read QC
     fastp --in1 ../../275_K5_Holger_S92_R1_001.fastq.gz --in2 ../../275_K5_Holger_S92_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     #Step 2: Polypolish
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       bwa index ${cluster}/7_final_consensus.fasta
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
       polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     done
    
     #Step 3: POLCA
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       cd ${cluster}
       polca.sh -a polypolish.fasta -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 146
     #Insertion/Deletion Errors: 2
     #Assembly Size: 2504401
     #Consensus Quality: 99.9941
    
     #Substitution Errors: 41
     #Insertion/Deletion Errors: 0
     #Assembly Size: 41288
     #Consensus Quality: 99.9007
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 9191
     #Consensus Quality: 100
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2767
     #Consensus Quality: 100
    
     #Step 4: (optional) more rounds POLCA
     for cluster in cluster_001 cluster_002; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 41
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2504401
     #Consensus Quality: 99.9984
    
     #Substitution Errors: 8
     #Insertion/Deletion Errors: 0
     #Assembly Size: 41288
     #Consensus Quality: 99.9806
    
     for cluster in cluster_001 cluster_002; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 8
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2504401
     #Consensus Quality: 99.9997
    
     #Substitution Errors: 4
     #Insertion/Deletion Errors: 0
     #Assembly Size: 41288
     #Consensus Quality: 99.9903
    
     for cluster in cluster_001 cluster_002; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 8
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2504401
     #Consensus Quality: 99.9997
    
     #Substitution Errors: 4
     #Insertion/Deletion Errors: 0
     #Assembly Size: 41288
     #Consensus Quality: 99.9903
    
     #-- HD5_2_K6 (4) --
     #mean read depth: 116.7x
     #4 bp have a depth of zero (99.9998% coverage)
     #1,022 positions changed (0.0408% of total positions)
     #estimated pre-polishing sequence accuracy: 99.9592% (Q33.89)
    
     #Step 1: read QC
     fastp --in1 ../../276_K6_Holger_S95_R1_001.fastq.gz --in2 ../../276_K6_Holger_S95_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
     #Step 2: Polypolish
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       bwa index ${cluster}/7_final_consensus.fasta
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
       bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
       polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
     done
    
     #Step 3: POLCA
     for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
       cd ${cluster}
       polca.sh -a polypolish.fasta -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 164
     #Insertion/Deletion Errors: 2
     #Assembly Size: 2504398
     #Consensus Quality: 99.9934
    
     #Substitution Errors: 22
     #Insertion/Deletion Errors: 0
     #Assembly Size: 41288
     #Consensus Quality: 99.9467
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 9191
     #Consensus Quality: 100
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2767
     #Consensus Quality: 100
    
     #Step 4: (optional) more rounds POLCA
     for cluster in cluster_001 cluster_002; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 32
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2504400
     #Consensus Quality: 99.9987
    
     #Substitution Errors: 0
     #Insertion/Deletion Errors: 0
     #Assembly Size: 41288
     #Consensus Quality: 100
    
     for cluster in cluster_001; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 4
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2504400
     #Consensus Quality: 99.9998
    
     for cluster in cluster_001; do
       cd ${cluster}
       polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
       cd ..
     done
     #Substitution Errors: 2
     #Insertion/Deletion Errors: 0
     #Assembly Size: 2504400
     #Consensus Quality: 99.9999
  6. Results by directly using Unicycler

     #----------------------- 5179R1_normal ----------------------- 
    
     >1 length=2468563 depth=1.00x circular=true
     >2 length=17748 depth=1.42x circular=true
    
     Component   Segments   Links   Length      N50         Longest segment   Status
         total          2       2   2,486,311   2,468,563         2,468,563
             1          1       1   2,468,563   2,468,563         2,468,563   complete
             2          1       1      17,748      17,748            17,748   complete
    
     Segment   Length      Depth   Starting gene         Position    Strand    Identity   Coverage
         1   2,468,563   1.00x   UniRef90_Q5HJZ9       1,212,460   forward     100.0%     100.0%
         2      17,748   1.42x   UniRef90_A0A0H2VIR3       4,804   reverse      93.2%      99.7%
    
     # ---- 5179_bold ----
    
     Segment   Length      Depth    Starting gene         Position    Strand    Identity   Coverage
         1   2,469,173    1.00x   UniRef90_Q5HJZ9       1,901,872   reverse     100.0%     100.0%
         2      17,749    2.27x   UniRef90_A0A0H2VIR3       4,771   forward      93.2%      99.7%
         4       4,595   10.19x   none found
         8       2,449   17.14x   none found
    
     >1 length=2469173 depth=1.00x circular=true
     >2 length=17749 depth=2.27x circular=true
     >3 length=4761 depth=0.44x
     >4 length=4595 depth=10.19x circular=true
     >5 length=3735 depth=0.29x
     >6 length=3718 depth=0.42x
     >7 length=3573 depth=0.52x
     >8 length=2449 depth=17.14x circular=true
     >9 length=2411 depth=0.35x
     >10 length=2371 depth=0.32x
     >11 length=2365 depth=0.43x
     >12 length=1637 depth=0.44x
     >13 length=1568 depth=0.66x
     >14 length=1505 depth=0.65x
     >15 length=1403 depth=0.93x
     >16 length=1329 depth=0.55x
    
     makeblastdb -in assembly.fasta -dbtype nucl
     blastn -task blastn-short -db ../HD05_2_K5_conservative/assembly.fasta -query assembly.fasta -out 2-16_vs_1.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1 
    
     #TODO: manually fill the gap in the HD05_2 genome!
    
     5       1       99.946  3728    1       1       1       3728    1535666 1539392 0.0     7366
     6       1       99.973  3718    0       1       1       3718    702963  706679  0.0     7355
     7       1       99.888  3573    1       3       1       3573    1764622 1768191 0.0     7027
     9       1       100.000 2411    0       0       1       2411    1060914 1063324 0.0     4779
     10      1       100.000 2371    0       0       1       2371    615275  612905  0.0     4700
     11      1       99.958  2365    0       1       1       2365    1088713 1086350 0.0     4672
     12      1       100.000 1637    0       0       1       1637    146635  144999  0.0     3245
     13      1       99.936  1568    0       1       1       1568    2024197 2025763 0.0     3092
     14      1       100.000 1505    0       0       1       1505    2445480 2443976 0.0     2983
     15      1       100.000 1403    0       0       1       1403    197723  196321  0.0     2781
     16      1       99.925  1329    1       0       1       1329    49854   48526   0.0     2627
    
     # -------------------- 1585_normal --------------------
     >1 length=2443574 depth=1.00x circular=true       #contig_1        2442282 10      60      61
     >2 length=9014 depth=3.72x circular=true
     >3 length=4388 depth=0.89x
     >4 length=3443 depth=0.48x
     >5 length=3338 depth=0.48x
     >6 length=3336 depth=0.45x
     >7 length=2344 depth=11.44x circular=true
     >8 length=2255 depth=9.81x circular=true
     >9 length=1929 depth=0.37x
     >10 length=1703 depth=1.67x
     >11 length=1605 depth=0.26x
     >12 length=1381 depth=0.56x
     >13 length=1360 depth=0.39x
     >14 length=1281 depth=0.41x
     >15 length=1163 depth=0.51x
     >16 length=1088 depth=0.24x
    
     2594107
    
     ragtag.py scaffold  ../HD05_2_K5_normal/assembly.fasta assembly.fasta
     ragtag.py patch    ragtag.scaffold.fasta ../../HD05_2_K5_normal/assembly.fasta
     grep -o 'N' ragtag.patch.fasta | wc -l
    
     3       1       99.977  4388    0       1       1       4388    2410738 2406352 0.0     8683
     4       1       99.942  3443    0       2       1       3443    2222741 2219301 0.0     6794
     5       1       99.970  3338    0       1       1       3338    455636  452300  0.0     6601
     6       1       99.940  3336    0       2       1       3336    1617740 1614407 0.0     6581
     9       1       99.948  1929    0       1       1       1929    1321522 1319595 0.0     3808
     10      1       99.941  1703    1       0       1       1703    90503   88801   0.0     3368
     11      1       99.938  1605    0       1       1       1605    2361795 2363398 0.0     3166
     12      1       99.928  1381    0       1       1       1381    241092  242471  0.0     2722
     13      1       100.000 1360    0       0       1       1360    1157897 1159256 0.0     2696
     14      1       100.000 1281    0       0       1       1281    218323  219603  0.0     2539
     15      1       100.000 1163    0       0       1       1163    2077536 2078698 0.0     2305
     16      1       100.000 1088    0       0       1       1088    283284  284371  0.0     2157
    
     >1 length=2503585 depth=1.00x circular=true
     >2 length=41288 depth=3.32x circular=true
     >3 length=9191 depth=8.29x circular=true
     >4 length=2767 depth=9.36x circular=true
    
     >1 length=2503927 depth=1.00x circular=true
     >2 length=41288 depth=3.77x circular=true
     >3 length=9191 depth=7.83x circular=true
     >4 length=2767 depth=10.11x circular=true
    
     #--------------------------
     1585V
     #[2024-01-17 13:42:28] INFO: Assembly statistics:
    
         Total length:   2438882 vs 2443574
         Fragments:  1
         Fragments N50:  2438882
         Largest frg:    2438882
         Scaffolds:  0
         Mean coverage:  47
    
     unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
     unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
     #3 no short sequencing
     unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
    
     unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
     unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
     unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
    
     # ---- 1  5179R1  2469692 ----
     >1 length=2468563 depth=1.00x circular=true
     >2 length=17748 depth=1.42x circular=true
    
     # ---- 2  1585    2442282 ---- (compring to Trycyler chrom is 2443176 nt)
     >1 length=2443574 depth=1.00x circular=true
     >2 length=9014 depth=3.72x circular=true
     >7 length=2344 depth=11.44x circular=true
     >8 length=2255 depth=9.81x circular=true
    
     # ---- 3  1585v   2438882 ----
     #using long sequencing only 1
    
     # ---- 4  5179_bold    2471107+17740 ----
     >1 length=2469173 depth=1.00x circular=true
     >2 length=17749 depth=2.27x circular=true
     >4 length=4595 depth=10.19x circular=true
     >8 length=2449 depth=17.14x circular=true
    
     # ---- 5  HD05_2  2504622 ----
     # Note in HD05_2_bold_hq_lq including the bad long-reads.
     >1 length=965875 depth=0.95x
     >2 length=855325 depth=1.00x
     >3 length=582944 depth=1.02x
     >4 length=183656 depth=1.02x
     >5 length=13570 depth=4.73x circular=true
     >6 length=1503 depth=4.85x
     >7 length=1271 depth=5.06x
     >8 length=227 depth=2.03x
     >9 length=153 depth=0.93x
     >10 length=152 depth=1.09x
    
     # trycycler: 2503231 (yes), 9183 (yes), 22394, 18541 --
    
     # ---- 6  HD05_2_K5  2504656+41290+9191 ----
     conservative
     >1 length=2503585 depth=1.00x circular=true
     >2 length=41288 depth=3.32x circular=true
     >3 length=9191 depth=8.29x circular=true
     >4 length=2767 depth=9.36x circular=true
    
     # ---- 7  HD05_2_K6  2504588+41285+9192 ----
     conservative
     >1 length=2503927 depth=1.00x circular=true
     >2 length=41288 depth=3.77x circular=true
     >3 length=9191 depth=7.83x circular=true
     >4 length=2767 depth=10.11x circular=true
    
     ragtag.py scaffold  ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
     ragtag.py patch  ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
     grep -o 'N' ragtag.patch.fasta | wc -l
    
     makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
     blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.000001 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1 
  7. Submit all genomes to NCBI

     TODO: If 1585V using only the long reads to assemble the genome!
     BioSample accession
             BioProject: PRJNA1038700
             Staphylococcus epidermidis strain:1585v | isolate:1585v Genome sequencing 
                 SAMN38198576
                 Pathogen: clinical or host-associated sample from Staphylococcus epidermidis 
                     0 SRAs
    
     Status
         To be released
    
     Release date
         2027-11-10
    
     Created
         2023-11-10 15:24
    
     Updated
         2023-11-17 16:57
    
     Sample name
         1585v 
    
     Package
         Pathogen: clinical or host-associated; version 1.0 
    
     Organism
    
             Name:
                 Staphylococcus epidermidis
    
             Taxonomy ID:
                 1282
    
     Attributes
         Attribute name  Attribute value
    
             collected by
             H R
    
             collection date
             2004
    
             geographic location
             Germany: Hamburg
    
             host
             Homo sapiens
    
             host disease
             port-catheter infection
    
             isolation source
             port-catheter
    
             isolate
             missing
    
             strain
             1585v
    
             latitude and longitude
             53.551672 N 9.955081 E
    
             https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#run_pgap
  8. Background of 1585v and 1585

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346721/ S. epidermidis 1585 is known to be biofilm-negative in laboratory media, but to form biofilm in the presence of human serum.

In contrast, S. epidermidis 1585 v is a variant derived from strain 1585 in which, due to a chromosomal re-arrangement, a 460 kDa isoform of Embp is overexpressed even in TSB, while mutant M135 is an isogenic mutant of 1585 v in which expression of Embp is interrupted by insertion of transposon Tn917.

Staphylococcus epidermidis (S. epidermidis) 1585 is a specific strain of S. epidermidis that is classified as a wild-type strain. Wild-type in bacterial terminology refers to the strain of an organism that is found in nature, as opposed to those that have been modified or mutated in a laboratory setting. Here are some key points about the 1585 wild-type strain of S. epidermidis:

  1. No Embp Production in TSB: One notable characteristic of the S. epidermidis 1585 strain is that it does not produce Embp (extracellular matrix binding protein) when grown in TSB (Tryptic Soy Broth). Embp is a protein that plays a crucial role in the biofilm formation and adherence of bacteria to surfaces. The absence of Embp production in this strain could impact its ability to form biofilms, a common virulence factor in Staphylococcus infections.

  2. Biofilm Formation: S. epidermidis is known for its ability to form biofilms, especially on medical devices, leading to infections that are difficult to treat. The fact that the 1585 strain doesn’t produce Embp in TSB suggests it may have a reduced capacity for biofilm formation under these conditions, which could be significant in understanding and managing such infections.

  3. Research and Clinical Implications: Studying wild-type strains like S. epidermidis 1585 is important for understanding the natural behavior and characteristics of the species. Since this strain behaves differently from other strains in terms of Embp production and possibly biofilm formation, it can provide insights into the mechanisms and genetic factors that control these processes. This knowledge is valuable for developing strategies to prevent and treat infections, especially in hospital and healthcare settings where S. epidermidis infections are common.

  4. Genetic Studies: The 1585 strain can also serve as a baseline or control in genetic studies. By comparing the genome and behavior of 1585 with other strains of S. epidermidis, researchers can identify genetic variations and mutations that may be responsible for different phenotypes, such as increased virulence or antibiotic resistance.

Model Organism for Understanding Staphylococcal Behavior: As a wild-type strain, 1585 offers a model for studying the natural state of S. epidermidis. This is crucial for understanding the fundamental biology of the bacterium, which can help in the development of treatments and interventions against infections caused by more virulent or drug-resistant strains. In summary, the S. epidermidis

1585 wild-type strain is significant in microbiological research due to its natural characteristics, particularly its behavior in biofilm formation and Embp production. Understanding these aspects can contribute to better insights into the pathogenicity and treatment of Staphylococcus infections, particularly in clinical settings where these bacteria are a common source of nosocomial infections.

Leave a Reply

Your email address will not be published. Required fields are marked *