gene_x 0 like s 44 view s
Tags: processing, pipeline
Using bacto pipeline to get trimmed reads
cp /home/jhuang/Tools/bacto/bacto-0.1.json .
cp /home/jhuang/Tools/bacto/cluster.json .
cp /home/jhuang/Tools/bacto/Snakefile .
ln -s /home/jhuang/Tools/bacto/local .
ln -s /home/jhuang/Tools/bacto/db .
ln -s /home/jhuang/Tools/bacto/envs .
mkdir raw_data; cd raw_data
ln -s ../20241028_FS10003086_74_BTR67801-2217/Alignment_Imported_1/20241029_175539/Fastq/HSV1_S1_L001_R1_001.fastq.gz HSV1_S1_R1.fastq.gz
ln -s ../20241028_FS10003086_74_BTR67801-2217/Alignment_Imported_1/20241029_175539/Fastq/HSV1_S1_L001_R2_001.fastq.gz HSV1_S1_R2.fastq.gz
ln -s ../20241028_FS10003086_74_BTR67801-2217/Alignment_Imported_1/20241029_175539/Fastq/HSV-Klinik_S2_L001_R1_001.fastq.gz HSV-Klinik_S2_R1.fastq.gz
ln -s ../20241028_FS10003086_74_BTR67801-2217/Alignment_Imported_1/20241029_175539/Fastq/HSV-Klinik_S2_L001_R2_001.fastq.gz HSV-Klinik_S2_R2.fastq.gz
ln -s ../20241028_FS10003086_74_BTR67801-2217/Alignment_Imported_1/20241029_175539/Fastq/NTC_S3_L001_R1_001.fastq.gz NTC_S3_R1.fastq.gz
ln -s ../20241028_FS10003086_74_BTR67801-2217/Alignment_Imported_1/20241029_175539/Fastq/NTC_S3_L001_R2_001.fastq.gz NTC_S3_R2.fastq.gz
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
#0 HSV1_S1 snp=8 del=1 ins=0 het=97 unaligned=125842
#1 HSV-Klinik_S2 snp=94 del=1 ins=2 het=550 unaligned=78080
Filtering low complexity
fastp -i HSV1_S1_trimmed_P_1.fastq -I HSV1_S1_trimmed_P_2.fastq -o HSV1_S1_trimmed_R1.fastq -O HSV1_S1_trimmed_R2.fastq --low_complexity_filter --complexity_threshold 30
fastp -i HSV-Klinik_S2_trimmed_P_1.fastq -I HSV-Klinik_S2_trimmed_P_2.fastq -o HSV-Klinik_S2_trimmed_R1.fastq -O HSV-Klinik_S2_trimmed_R2.fastq --low_complexity_filter --complexity_threshold 30
Read1 before filtering:
total reads: 1755209
total bases: 163663141
Q20 bases: 162306612(99.1711%)
Q30 bases: 159234526(97.2941%)
Read2 before filtering:
total reads: 1755209
total bases: 163045950
Q20 bases: 161178082(98.8544%)
Q30 bases: 157052184(96.3239%)
Read1 after filtering:
total reads: 1733241
total bases: 161547828
Q20 bases: 160217907(99.1768%)
Q30 bases: 157196236(97.3063%)
Read2 aftering filtering:
total reads: 1733241
total bases: 160825521
Q20 bases: 159057902(98.9009%)
Q30 bases: 155354052(96.5979%)
Filtering result:
reads passed filter: 3466482
reads failed due to low quality: 550
reads failed due to too many N: 0
reads failed due to too short: 0
reads failed due to low complexity: 43386
reads with adapter trimmed: 21424
bases trimmed due to adapters: 159261
Duplication rate: 14.2379%
Insert size peak (evaluated by paired-end reads): 41
JSON report: fastp.json
HTML report: fastp.html
fastp -i HSV1_S1_trimmed_P_1.fastq -I HSV1_S1_trimmed_P_2.fastq -o HSV1_S1_trimmed_R1.fastq -O HSV1_S1_trimmed_R2.fastq --low_complexity_filter --complexity_threshold 30
fastp v0.20.1, time used: 7 seconds
Read1 before filtering:
total reads: 2688264
total bases: 330035144
Q20 bases: 326999269(99.0801%)
Q30 bases: 320136918(97.0009%)
Read2 before filtering:
total reads: 2688264
total bases: 327364405
Q20 bases: 323331005(98.7679%)
Q30 bases: 314500076(96.0703%)
Read1 after filtering:
total reads: 2660598
total bases: 326564634
Q20 bases: 323572956(99.0839%)
Q30 bases: 316783667(97.0049%)
Read2 aftering filtering:
total reads: 2660598
total bases: 324709841
Q20 bases: 320840657(98.8084%)
Q30 bases: 312570288(96.2614%)
Filtering result:
reads passed filter: 5321196
reads failed due to low quality: 1110
reads failed due to too many N: 0
reads failed due to too short: 0
reads failed due to low complexity: 54222
reads with adapter trimmed: 39080
bases trimmed due to adapters: 357915
Duplication rate: 9.91821%
Insert size peak (evaluated by paired-end reads): 96
JSON report: fastp.json
HTML report: fastp.html
fastp -i HSV-Klinik_S2_trimmed_P_1.fastq -I HSV-Klinik_S2_trimmed_P_2.fastq -o HSV-Klinik_S2_trimmed_R1.fastq -O HSV-Klinik_S2_trimmed_R2.fastq --low_complexity_filter --complexity_threshold 30
fastp v0.20.1, time used: 15 seconds
Using vrap to assembly and annotate the contigs, the spades-step was replaced with idba of DAMIAN
ln -s ~/Tools/vrap/ .
#CHANGE the txid10298 in download_db.py: txid10298[Organism] AND complete genome[Title]
gzip trimmed/*_R1.fastq trimmed/*_R2.fastq
mv trimmed/*.gz ./
#--host /home/jhuang/REFs/genome.fa -n /mnt/nvme0n1p1/blast/nt -a /mnt/nvme0n1p1/blast/nr
vrap/vrap.py -1 trimmed/HSV1_S1_R1.fastq.gz -2 trimmed/HSV1_S1_R2.fastq.gz -o HSV1_S1_vrap_out -t 40 -l 100
vrap/vrap.py -1 trimmed/HSV-Klinik_S2_R1.fastq.gz -2 trimmed/HSV-Klinik_S2_R2.fastq.gz -o HSV-Klinik_S2_vrap_out -t 40 -l 100
#--> ERROR in spades-assembly, we usding idba from DAMIAN assembly, copy the assembly to spades. IT WORKS!
damian.rb --host human3 --type dna -1 /home/jhuang/DATA/Data_Nicole_CaptureProbeSequencing/HSV1_S1_trimmed_R1.fastq.gz -2 /home/jhuang/DATA/Data_Nicole_CaptureProbeSequencing/HSV1_S1_trimmed_R2.fastq.gz --sample HSV1_S1_megablast --blastn never --blastp never --min_contiglength 100 --threads 56 --force
damian.rb --host human3 --type dna -1 /home/jhuang/DATA/Data_Nicole_CaptureProbeSequencing/HSV-Klinik_S2_trimmed_R1.fastq.gz -2 /home/jhuang/DATA/Data_Nicole_CaptureProbeSequencing/HSV-Klinik_S2_trimmed_R2.fastq.gz --sample HSV-Klinik_S2_megablast --blastn never --blastp never --min_contiglength 100 --threads 56 --force
cp ~/rtpd_files/Affe30_megablast/idba_ud_assembly/contig.fa contigs.fasta
cp ~/rtpd_files/Affe31_megablast/idba_ud_assembly/contig.fa contigs.fasta
#RERUN vrap/vrap.py again with the replaced contigs.fasta!
#vrap/vrap.py -1 Affe30_trimmed_R1.fastq.gz -2 Affe30_trimmed_R2.fastq.gz -o Affe30_trimmed_vrap_out -t 40 -l 100
#vrap/vrap.py -1 Affe31_trimmed_R1.fastq.gz -2 Affe31_trimmed_R2.fastq.gz -o Affe31_trimmed_vrap_out -t 40 -l 100
Analyses using viral-ngs
mv Snakefile Snakefile_bacto
mv trimmed trimmed_bacto
conda activate viral3
#conda install -c anaconda openjdk=8
ln -s ~/Tools/viral-ngs/Snakefile Snakefile
ln -s ~/Tools/viral-ngs/bin bin
cp ~/Tools/viral-ngs/refsel.acids refsel.acids
cp ~/Tools/viral-ngs/lastal.acids lastal.acids
cp ~/Tools/viral-ngs/config.yaml config.yaml
cp ~/Tools/viral-ngs/samples-runs.txt samples-runs.txt
cp ~/Tools/viral-ngs/samples-depletion.txt samples-depletion.txt
cp ~/Tools/viral-ngs/samples-metagenomics.txt samples-metagenomics.txt
cp ~/Tools/viral-ngs/samples-assembly.txt samples-assembly.txt
cp ~/Tools/viral-ngs/samples-assembly-failures.txt samples-assembly-failures.txt
mkdir data
cd data
mkdir 00_raw
cd ../..
Prepare lastal.acids, refsel.acids and accessions_for_ref_genome_build in config.yaml
#Herpes simplex virus 1 (HSV-1) and Human alphaherpesvirus 1 (also known as Simplexvirus humanalpha1) are indeed the same virus.
#The different names result from varied naming conventions:
# * Herpes simplex virus 1 (HSV-1) is the common name, often used in clinical and general contexts.
# * Human alphaherpesvirus 1 is the official taxonomic name, as defined by the International Committee on Taxonomy of Viruses (ICTV). This name is used in scientific classifications and databases like NCBI to specify its place in the Herpesviridae family under the Alphaherpesvirinae subfamily.
#In some databases or references, it might also appear under Simplexvirus humanalpha1, which refers to its taxonomic classification at the genus level (Simplexvirus) and species level (Human alphaherpesvirus 1). However, all these terms refer to the same virus, commonly known as HSV-1.
#https://www.uniprot.org/taxonomy?query=Human+herpesvirus
#https://www.uniprot.org/taxonomy/3050292
esearch -db nuccore -query "txid3050292[Organism]" | efetch -format fasta > taxon_3050292_sequences.fasta
esearch -db nuccore -query "txid3050292[Organism]" | efetch -format acc > taxon_3050292_accessions.txt
esearch -db nuccore -query "txid10298[Organism] AND complete genome[Title]" | efetch -format fasta > taxon_3050292_complete_genomes.fasta
esearch -db nuccore -query "txid10298[Organism] AND complete genome[Title]" | efetch -format acc > taxon_10298_complete_genomes.acc # 161 genomes
mv taxon_10298_complete_genomes.acc lastal.acids
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10298
Human alphaherpesvirus 1 (Herpes simplex virus type 1) Click on organism name to get more information.
Human alphaherpesvirus 1 strain 17
Human alphaherpesvirus 1 strain A44
Human alphaherpesvirus 1 strain Angelotti
Human alphaherpesvirus 1 strain CL101
Human alphaherpesvirus 1 strain CVG-2
Human alphaherpesvirus 1 strain F
Human alphaherpesvirus 1 strain H129
Human alphaherpesvirus 1 strain HFEM
Human alphaherpesvirus 1 strain HZT
Human alphaherpesvirus 1 strain KOS
Human alphaherpesvirus 1 strain MGH-10
Human alphaherpesvirus 1 strain MP
Human alphaherpesvirus 1 strain Patton
Human alphaherpesvirus 1 strain R-15
Human alphaherpesvirus 1 strain R19
Human alphaherpesvirus 1 strain RH2
Human alphaherpesvirus 1 strain SC16
Trimming using trimmomatic
mkdir trimmed bams
for sample in HSV1_S1 HSV-Klinik_S2 NTC_S3; do
for sample in HSV1_S1; do
java -jar /home/jhuang/Tools/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 ./raw_data/${sample}_R1.fastq.gz ./raw_data/${sample}_R2.fastq.gz trimmed/${sample}_R1.fastq.gz trimmed/${sample}_unpaired_R1.fastq.gz trimmed/${sample}_R2.fastq.gz trimmed/${sample}_unpaired_R2.fastq.gz ILLUMINACLIP:/home/jhuang/Tools/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 AVGQUAL:20; \
done
Mapping
cd trimmed
seqtk sample -s100 HSV1_S1_R1.fastq.gz 0.1 > HSV1_S1_sampled_R1.fastq
seqtk sample -s100 HSV1_S1_R2.fastq.gz 0.1 > HSV1_S1_sampled_R2.fastq
gzip HSV1_S1_sampled_R1.fastq HSV1_S1_sampled_R2.fastq
ref_fa="NC_001806.fasta";
for sample in HSV1_S1 HSV-Klinik_S2 NTC_S3; do
for sample in HSV1_S1; do
for sample in HSV1_S1_sampled; do
bwa index ${ref_fa}; \
bwa mem -M -t 16 ${ref_fa} trimmed/${sample}_R1.fastq.gz trimmed/${sample}_R2.fastq.gz | samtools view -bS - > bams/${sample}_genome_alignment.bam; \
#for table filling using the following commands! -->3000000 \
#bwa mem -M -t 14 ${ref_fa} ${sample}_R1.fastq.gz ${sample}_R2.fastq.gz | samtools view -bS -F 256 - > bams/${sample}_uniqmap.bam; \
done
AddOrReplaceReadGroup is IMPORTANT step, otherwise the step viral_ngs cannot run correctly
for sample in HSV1_S1 HSV-Klinik_S2 NTC_S3; do
for sample in HSV1_S1; do
for sample in HSV1_S1_sampled; do
picard AddOrReplaceReadGroups I=bams/${sample}_genome_alignment.bam O=data/00_raw/${sample}.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=$sample RGSM=$sample RGLB=standard RGPU=$sample VALIDATION_STRINGENCY=LENIENT; \
done
Configure the viral-ngs conda environment
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels broad-viral
mamba env remove -n viral-ngs3
mamba create -n viral-ngs3 python=3.6 blast=2.6.0 bmtagger biopython pysam pyyaml picard mvicuna pybedtools fastqc matplotlib spades last=876 -c conda-forge -c bioconda
conda activate viral-ngs3
mamba create -n viral-ngs4 python=3.6
conda activate viral-ngs4
#vim requirements-conda.txt
mamba install blast=2.6.0 bmtagger biopython pysam pyyaml picard mvicuna pybedtools fastqc matplotlib spades last=876 -c conda-forge -c bioconda
# -- Eventually DEBUG --
#mamba remove picard
#mamba clean --packages
#mamba install -c bioconda picard
##mamba install libgfortran=5 sqlite=3.46.0
##mamba install picard --clobber
##mamba create -n viral-ngs-fresh -c bioconda -c conda-forge picard python=3.6 sqlite=3.46.0 libgfortran=5
mamba install cd-hit cd-hit-auxtools diamond gap2seq=2.1 mafft=7.221 mummer4 muscle=3.8 parallel pigz prinseq samtools=1.6 tbl2asn trimmomatic trinity unzip vphaser2 bedtools -c r -c defaults -c conda-forge -c bioconda #-c broad-viral
mamba install snpeff=4.1l
mamba install gatk=3.6
mamba install bwa
#IMPORTANT_REPLACE "sudo cp /home/jhuang/miniconda3/envs/viral-ngs4/bin/gatk3 /usr/local/bin/gatk"
#IMPORTANT_UPDATE jar_file in the file with "/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar"
#IMPORTANT_SET /home/jhuang/Tools/GenomeAnalysisTK-3.6 as GATK_PATH in config.yaml
#IMPORTANT_CHECK if it works
# java -jar /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator --help
# /usr/local/bin/gatk -T RealignerTargetCreator --help
#IMPORTANT_NOTE that the env viral-ngs4 cannot logined from the base env due to the python3-conflict!
# -- NO ERROR --> INSTALL END HERE --
# -- DEBUG: ClobberError: This transaction has incompatible packages due to a shared path. --
# SafetyError: The package for snpeff located at /home/jhuang/miniconda3/pkgs/snpeff-4.1l-hdfd78af_8
# appears to be corrupted. The path 'share/snpeff-4.1l-8/snpEff.config'
# has an incorrect size.
# reported size: 9460047 bytes
# actual size: 9460357 bytes
#
# ClobberError: This transaction has incompatible packages due to a shared path.
# packages: bioconda/linux-64::bowtie2-2.5.4-h7071971_4, bioconda/linux-64::bowtie-1.3.1-py36h769816f_3
# path: 'bin/scripts/convert_quals.pl'
# sovle confilict between bowtie, bowtie2 and snpeff
mamba remove bowtie
mamba install bowtie2
mamba remove snpeff
mamba install snpeff=4.1l
# -- WITH ERROR caused by bowtie and snpeff --> INSTALL END HERE --
#mamba install -c bioconda viral-ngs #so that gatk3-register and novoalign-license-register available --> ERROR
#Due to license restrictions, the viral-ngs conda package cannot distribute and install GATK directly. To fully install GATK, you must download a licensed copy of GATK v3.8 from the Broad Institute, and call “gatk3-register,” which will copy GATK into your viral-ngs conda environment:
mkdir -p /path/to/gatk_dir
wget -O - 'https://software.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=3.6-0-g89b7209' | tar -xjvC /path/to/gatk_dir
gatk3-register /path/to/gatk_dir/GenomeAnalysisTK.jar
#The single-threaded version of Novoalign is installed by default. If you have a license for Novoalign to enable multi-threaded operation, viral-ngs will copy it to the viral-ngs conda environment if the NOVOALIGN_LICENSE_PATH environment variable is set. Alternatively, the conda version of Novoalign can be overridden if the NOVOALIGN_PATH environment variable is set. If you obtain a Novoalign license after viral-ngs has already been installed, it can be added to the conda environment by calling:
# obtain a Novoalign license file: novoalign.lic
novoalign-license-register /path/to/novoalign.lic
# # --We don't have registers, so we have to manually install novoalign and gatk--
# #At first install novoalign, then samtools
# mamba remove samtools
# mamba install -c bioconda novoalign # Eventually not necessary, since the path is defined in config.yaml NOVOALIGN_PATH: "/home/jhuang/Tools/novocraft_v3", and novoalign.lic is also in the same path.
# mamba install -c bioconda samtools
#
# mamba install -c bioconda gatk #(3.8) #IN /usr/local/bin/gatk FROM /home/jhuang/Tools/SPANDx_v3.2/GenomeAnalysisTK.jar
# #UPDATED TO: '/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar'
# # If necessary, clean up the conda cache. This will remove any partially installed or corrupted packages.
# conda clean --all
## reinstall samtools 1.6 --> NOT RELEVANT
#mamba install samtools=1.6
Run snakemake
#Set values in samples-*.txt before running viral-ngs
#conda list --export > environment2.yml
#mamba create --name viral-ngs2 --file environment2.yml
#samtools fastq HSV1_S1_sampled.bam > HSV1_S1_sampled.fastq
snakemake --printshellcmds --cores 4
/usr/local/bin/gatk
https://software.broadinstitute.org/gatk/documentation/tooldocs/org_broadinstitute_gatk_tools_walkers_indels_RealignerTargetCreator.php
java -jar ~/Tools/GenomeAnalysisTK-3.8/GenomeAnalysisTK.jar -T RealignerTargetCreator --help #--> CORRECT
java -jar ~/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator --help #--> CORRECT
/usr/local/bin/gatk -T RealignerTargetCreator --help
https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_indels_RealignerTargetCreator.php
Djava.io.tmpdir=/tmp/tmp-assembly-refine_assembly-2d9z3pcr
java -jar ~/Tools/GenomeAnalysisTK-2.8-1/GenomeAnalysisTK.jar -T RealignerTargetCreator -I /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmp0_vh27ji.rmdup.bam -R /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmppwwyriob.deambig.fasta -o /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmp_o2f2e0o.intervals --num_threads 120
java -jar ~/Tools/GenomeAnalysisTK-3.8/GenomeAnalysisTK.jar -T RealignerTargetCreator -I /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmp0_vh27ji.rmdup.bam -R /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmppwwyriob.deambig.fasta -o /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmp_o2f2e0o.intervals --num_threads 120
~/Tools/GenomeAnalysisTK-4.1.2.0/gatk -T RealignerTargetCreator -I /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmp0_vh27ji.rmdup.bam -R /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmppwwyriob.deambig.fasta -o /tmp/tmp-assembly-refine_assembly-2d9z3pcr/tmp_o2f2e0o.intervals --num_threads 120
# -- DEBUG_1: Configure the Conda Environment to Use Host's Java (version 17) while keeping BLAST 2.6.0+ --
bin/taxon_filter.py deplete data/00_raw/HSV1_S1.bam tmp/01_cleaned/HSV1_S1.raw.bam tmp/01_cleaned/HSV1_S1.bmtagger_depleted.bam tmp/01_cleaned/HSV1_S1.rmdup.bam data/01_cleaned/HSV1_S1.cleaned.bam --bmtaggerDbs /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3 /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/hg19 --blastDbs /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters --threads 120 --srprismMemory 142500000 --JVMmemory 256g --loglevel DEBUG
#2024-11-06 15:55:01,162 - __init__:444:_attempt_install - DEBUG - Currently installed version of blast: 2.16.0-hc155240_2
#2024-11-06 15:55:01,162 - __init__:448:_attempt_install - DEBUG - Expected version of blast: 2.6.0
#2024-11-06 15:55:01,162 - __init__:449:_attempt_install - DEBUG - Incorrect version of blast installed. Removing it...
# + (blast 2.6.0 needs java 17, therefore java="/usr/lib/jvm/java-17-openjdk-amd64/bin/java" in /home/jhuang/miniconda3/envs/viral-ngs2/bin/picard) blast 2.6.0 boost1.64_2 bioconda Cached
# + (bmtagger 3.101 needs blast 2.6.0) blast=2.6.0 + bmtagger 3.101 h470a237_4 bioconda Cached
# + pango 1.50.7 hbd2fdc8_0 conda-forge Cached
# + openjdk 11.0.15 hc6918da_0 conda-forge Cached
# + r-base 4.2.0 h1ae530e_0 pkgs/r Cached
# + picard 3.0.0 hdfd78af_0 bioconda Cached
# + java -version openjdk version "11.0.15-internal" 2022-04-19
Then, edit in the following file so that it can use the host java (version 17) for the viral-ngs2 picard 3.0.0! --
vim /home/jhuang/miniconda3/envs/viral-ngs2/bin/picard
# ---------------------------------------------------------
# Use Java installed with Anaconda to ensure correct version
java="$ENV_PREFIX/bin/java"
# if JAVA_HOME is set (non-empty), use it. Otherwise keep "java"
if [ -n "${JAVA_HOME:=}" ]; then
if [ -e "$JAVA_HOME/bin/java" ]; then
java="$JAVA_HOME/bin/java"
fi
fi
# -------------------------------------------------------->
#COMMENTED
# Use Java installed with Anaconda to ensure correct version
#java="$ENV_PREFIX/bin/java"
#MODIFIED
## if JAVA_HOME is set (non-empty), use it. Otherwise keep "java"
#if [ -n "${JAVA_HOME:=}" ]; then
# if [ -e "$JAVA_HOME/bin/java" ]; then
# java="$JAVA_HOME/bin/java"
# fi
#fi
java="/usr/lib/jvm/java-17-openjdk-amd64/bin/java"
# ---------------------------------------------------------
# -- DEBUG_2: lastal version not compatible --
bin/ncbi.py fetch_fastas j.huang@uke.de lastal_db NC_001806.2 --combinedFilePrefix lastal --removeSeparateFiles --forceOverwrite --chunkSize 300
bin/taxon_filter.py filter_lastal_bam data/01_cleaned/HSV1_S1.cleaned.bam lastal_db/lastal.fasta data/01_cleaned/HSV1_S1.taxfilt.bam --threads 120 --JVMmemory 256g --loglevel DEBUG
mamba remove last
mamba install -c bioconda last=876
lastal -V
bin/taxon_filter.py filter_lastal_bam data/01_cleaned/HSV1_S1.cleaned.bam lastal_db/lastal.fasta data/01_cleaned/HSV1_S1.taxfilt.bam --threads 120 --JVMmemory 256g --loglevel DEBUG
# -- DEBUG_3: lastal version not compatible --
bin/assembly.py gapfill_gap2seq tmp/02_assembly/HSV1_S1_sampled.assembly2-scaffolded.fasta data/01_per_sample/HSV1_S1_sampled.cleaned.bam tmp/02_assembly/HSV1_S1_sampled.assembly2-gapfilled.fasta --memLimitGb 12 --maskErrors --randomSeed 0 --loglevel DEBUG
#2024-11-07 12:34:14,732 - __init__:460:_attempt_install - DEBUG - Attempting install...
#2024-11-07 12:34:14,733 - __init__:545:install_package - DEBUG - Creating conda environment and installing package gap2seq=2.1
mamba install gap2seq=2.1
# -- DEBUG_4 --
bin/assembly.py impute_from_reference tmp/02_assembly/HSV1_S1_sampled.assembly2-gapfilled.fasta tmp/02_assembly/HSV1_S1_sampled.assembly2-scaffold_ref.fasta tmp/02_assembly/HSV1_S1_sampled.assembly3-modify.fasta --newName HSV1_S1_sampled --replaceLength 55 --minLengthFraction 0.05 --minUnambig 0.05 --index --loglevel DEBUG
2024-11-07 14:05:20,438 - __init__:445:_attempt_install - DEBUG - Currently installed version of muscle: 5.2-h4ac6f70_0
2024-11-07 14:05:20,438 - __init__:448:_attempt_install - DEBUG - Expected version of muscle: 3.8.1551
2024-11-07 14:05:20,438 - __init__:449:_attempt_install - DEBUG - Incorrect version of muscle installed. Removing it...
mamba install muscle=3.8
#- muscle 5.2 h4ac6f70_0 bioconda Cached
#+ muscle 3.8.1551 h7d875b9_6 bioconda Cached
#/home/jhuang/Tools/novocraft_v3/novoalign -f data/01_per_sample/HSV1_S1.cleaned.bam -r Random -l 20 -g 40 -x 20 -t 100 -F BAM -d tmp/02_assembly/HSV1_S1.assembly4-refined.nix -o SAM
# -- DEBUG_5 --
bin/assembly.py refine_assembly tmp/02_assembly/HSV1_S1_sampled.assembly3-modify.fasta data/01_per_sample/HSV1_S1_sampled.cleaned.bam tmp/02_assembly/HSV1_S1_sampled.assembly4-refined.fasta --outVcf tmp/02_assembly/HSV1_S1_sampled.assembly3.vcf.gz --min_coverage 2 --novo_params '-r Random -l 20 -g 40 -x 20 -t 502' --threads 120 --loglevel DEBUG
#Shebang in /usr/local/bin/gatk is corrupt.
# -- DEBUG_6 --
bin/interhost.py multichr_mafft ref_genome/reference.fasta data/02_assembly/HSV1_S1_sampled.fasta data/03_multialign_to_ref --ep 0.123 --maxiters 1000 --preservecase --localpair --outFilePrefix aligned --sampleNameListFile data/03_multialign_to_ref/sampleNameList.txt --threads 120 --loglevel DEBUG
2024-11-07 14:47:34,163 - __init__:445:_attempt_install - DEBUG - Currently installed version of mafft: 7.526-h4bc722e_0
2024-11-07 14:47:34,163 - __init__:448:_attempt_install - DEBUG - Expected version of mafft: 7.221
2024-11-07 14:47:34,164 - __init__:449:_attempt_install - DEBUG - Incorrect version of mafft installed. Removing it...
mamba install mafft=7.221
# -- DEBUG_7 --
bin/intrahost.py vphaser_one_sample data/02_align_to_self/HSV1_S1_sampled.mapped.bam data/02_assembly/HSV1_S1_sampled.fasta data/04_intrahost/vphaser2.HSV1_S1_sampled.txt.gz --vphaserNumThreads 120 --removeDoublyMappedReads --minReadsEach 5 --maxBias 10 --loglevel DEBUG
export TMPDIR=/home/jhuang/tmp
(viral-ngs) jhuang@WS-2290C:~/DATA/Data_Nicole_CaptureProbeSequencing$ /home/jhuang/miniconda3/envs/viral-ngs/bin/vphaser2 -i /home/jhuang/tmp/tmp_bq17yoq.mapped-withdoublymappedremoved.bam -o /home/jhuang/tmp/tmpyg8mlj5qvphaser2
samtools depth /home/jhuang/tmp/tmp_bq17yoq.mapped-withdoublymappedremoved.bam > coverage.txt
# -- DEBUG_8 --
snakemake --printshellcmds --cores 100
bin/intrahost.py merge_to_vcf ref_genome/reference.fasta data/04_intrahost/isnvs.vcf.gz --samples HSV1_S1 --isnvs data/04_intrahost/vphaser2.HSV1_S1.txt.gz --alignments data/03_multialign_to_ref/aligned_1.fasta --strip_chr_version --parse_accession
snakemake --printshellcmds --cores 100
# -- DEBUG_9 --
bin/assembly.py refine_assembly tmp/02_assembly/HSV-Klinik_S2.assembly3-modify.fasta data/01_per_sample/HSV-Klinik_S2.cleaned.bam tmp/02_assembly/HSV-Klinik_S2.assembly4-refined.fasta --outVcf tmp/02_assembly/HSV-Klinik_S2.assembly3.vcf.gz --min_coverage 2 --novo_params '-r Random -l 20 -g 40 -x 20 -t 502' --threads 120 --loglevel DEBUG
/usr/local/bin/gatk -Xmx20g -Djava.io.tmpdir=/home/jhuang/tmp/tmp-assembly-refine_assembly-dx3dr73p -T RealignerTargetCreator -I /home/jhuang/tmp/tmp-assembly-refine_assembly-dx3dr73p/tmpwbzvjo9j.rmdup.bam -R /home/jhuang/tmp/tmp-assembly-refine_assembly-dx3dr73p/tmpxq4obe29.deambig.fasta -o /home/jhuang/tmp/tmp-assembly-refine_assembly-dx3dr73p/tmptkw8zcf3.intervals --num_threads 120
mamba install gatk=3.6
#IMPORTANT_REPLACE "sudo cp /home/jhuang/miniconda3/envs/viral-ngs4/bin/gatk3 /usr/local/bin/gatk"
#IMPORTANT_UPDATE jar_file in the file with "/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar"
#IMPORTANT_SET /home/jhuang/Tools/GenomeAnalysisTK-3.6 as GATK_PATH in config.yaml
#IMPORTANT_CHECK if it works
# java -jar /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator --help
# /usr/local/bin/gatk -T RealignerTargetCreator --help
#IMPORTANT_NOTE that the env viral-ngs4 cannot logined from the base env due to the python3-conflict!
# -- DEBUG_10 (if the sequencing is too shawlow, then seperate running) --
/home/jhuang/miniconda3/envs/viral-ngs4/bin/vphaser2 -i /tmp/tmp2jl4plhy.mapped-withdoublymappedremoved.bam -o /tmp/tmp1x6jsiu_vphaser2
[EXIT]: gather_alignments: Failed to set region for reference HSV-Klinik_S2-1 in file /tmp/tmp2jl4plhy.mapped-withdoublymappedremoved.bam
# Run seperate intrahost.py --> no error:
#342 reads
2024-11-08 14:27:33,575 - intrahost:223:compute_library_bias - DEBUG - LB:standard has 161068 reads in 1 read group(s) (HSV-Klinik_S2)
2024-11-08 14:27:34,875 - __init__:445:_attempt_install - DEBUG - Currently installed version of vphaser2: 2.0-h7a259b3_14
samtools index HSV1_S1.mapped.bam
samtools index HSV-Klinik_S2.mapped.bam
bin/intrahost.py vphaser_one_sample data/02_align_to_self/HSV1_S1.mapped.bam data/02_assembly/HSV1_S1.fasta data/04_intrahost/vphaser2.HSV1_S1.txt.gz --vphaserNumThreads 120 --removeDoublyMappedReads --minReadsEach 5 --maxBias 10 --loglevel DEBUG
bin/intrahost.py vphaser_one_sample data/02_align_to_self/HSV-Klinik_S2.mapped.bam data/02_assembly/HSV-Klinik_S2.fasta data/04_intrahost/vphaser2.HSV-Klinik_S2.txt.gz --minReadsEach 1 --maxBias 2 --loglevel DEBUG # --vphaserNumThreads 120 --removeDoublyMappedReads
/home/jhuang/miniconda3/envs/viral-ngs4/bin/vphaser2 -i data/02_align_to_self/HSV-Klinik_S2.mapped.bam -o /tmp/tmpgacpc6eqvphaser2
samtools idxstats data/02_align_to_self/HSV-Klinik_S2.mapped.bam
samtools index data/02_align_to_self/HSV-Klinik_S2.mapped.bam
samtools view -H data/02_align_to_self/HSV-Klinik_S2.mapped.bam
/home/jhuang/miniconda3/envs/viral-ngs4/bin/vphaser2 -i data/02_align_to_self/HSV-Klinik_S2.mapped.bam -o /tmp/output_dir
/home/jhuang/miniconda3/envs/viral-ngs4/bin/vphaser2 -i data/02_align_to_self/HSV-Klinik_S2.mapped.bam -o /tmp/tmpgacpc6eqvphaser2
samtools view -b data/02_align_to_self/HSV-Klinik_S2.mapped.bam "HSV-Klinik_S2-1" > subset.bam
samtools index subset.bam
@SQ SN:HSV-Klinik_S2-1 LN:141125 AS:tmp35_s3ghx.ref_copy.nix
samtools view -b subset.bam "HSV-Klinik_S2-1:1-10000" > small_subset.bam
samtools index small_subset.bam
/home/jhuang/miniconda3/envs/viral-ngs4/bin/vphaser2 -i small_subset.bam -o /tmp/output_dir
/home/jhuang/miniconda3/envs/viral-ngs4/bin/vphaser2 -i subset.bam -o vphaser2_out
# -- DEBUG_11 in step multi_align_mafft: aligned_1.fasta is always empty, we need generate it manually with mafft and mark it as complete --
#[Fri Nov 8 14:51:45 2024]
#rule multi_align_mafft:
# input: data/02_assembly/HSV1_S1.fasta, data/02_assembly/HSV-Klinik_S2.fasta, ref_genome/reference.fasta
# output: data/03_multialign_to_ref/sampleNameList.txt, data/03_multialign_to_ref/aligned_1.fasta, data/03_multialign_to_ref/aligned_2.fasta, ... data/03_multialign_to_ref/aligned_161.fasta
# jobid: 24
# resources: tmpdir=/tmp, mem=8, threads=120
bin/interhost.py multichr_mafft ref_genome/reference.fasta data/02_assembly/HSV1_S1.fasta data/02_assembly/HSV-Klinik_S2.fasta data/03_multialign_to_ref --ep 0.123 --maxiters 1000 --preservecase --localpair --outFilePrefix aligned --sampleNameListFile data/03_multialign_to_ref/sampleNameList.txt --threads 120 --loglevel DEBUG
#b'/home/jhuang/miniconda3/envs/viral-ngs4/bin/python\n'
#-------
#2024-11-08 14:51:46,324 - cmd:193:main_argparse - INFO - software version: 1522433800, python version: 3.6.7 | packaged by conda-forge | (default, #Feb 28 2019, 09:07:38)
#[GCC 7.3.0]
#2024-11-08 15:00:26,375 - cmd:195:main_argparse - INFO - command: bin/interhost.py multichr_mafft inFastas=['ref_genome/reference.fasta', 'data/02_assembly/HSV1_S1.fasta', 'data/02_assembly/HSV-Klinik_S2.fasta'] localpair=True globalpair=None preservecase=True reorder=None gapOpeningPenalty=1.53 ep=0.123 verbose=False outputAsClustal=None maxiters=1000 outDirectory=data/03_multialign_to_ref outFilePrefix=aligned sampleRelationFile=None sampleNameListFile=data/03_multialign_to_ref/sampleNameList.txt threads=120 loglevel=DEBUG tmp_dir=/tmp tmp_dirKeep=False
#2024-11-08 15:00:26,375 - cmd:209:main_argparse - DEBUG - using tempDir: /tmp/tmp-interhost-multichr_mafft-sw91_svl
#2024-11-08 15:00:27,718 - __init__:445:_attempt_install - DEBUG - Currently installed version of mafft: 7.221-0
#2024-11-08 15:00:27,719 - mafft:141:execute - DEBUG - /home/jhuang/miniconda3/envs/viral-ngs4/bin/mafft --thread 120 --localpair --preservecase --op 1.53 --ep 0.123 --quiet --maxiterate 1000 /tmp/tmp-interhost-multichr_mafft-sw91_svl/tmp68_ln_ha.fasta
snakemake --cleanup-metadata 03_multialign_to_ref --cores 4
(Optional)
GapFiller.pl -l libraries_p2564.txt -s data/02_assembly/p2564.fasta
#parainfluenza bwa /home/jhuang/DATA/Data_parainfluenza/trimmed/p2564_R1.fastq.gz /home/jhuang/DATA/Data_parainfluenza/trimmed/p2564_R2.fastq.gz 300 1.0 FR
#since HSV1 and HSV-Klinik_S2 has different regions covered --> multialign_to_ref is none!
bin/intrahost.py vphaser_one_sample data/02_align_to_self/HSV-Klinik_S2.mapped.bam data/02_assembly/HSV-Klinik_S2.fasta data/04_intrahost/vphaser2.HSV-Klinik_S2.txt.gz --vphaserNumThreads 120 --removeDoublyMappedReads --minReadsEach 5 --maxBias 10
samtools flagstat HSV1_S1.taxfilt.bam
45674 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
45674 + 0 paired in sequencing
22837 + 0 read1
22837 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
342 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
342 + 0 mapped (100.00% : N/A)
342 + 0 paired in sequencing
163 + 0 read1
179 + 0 read2
258 + 0 properly paired (75.44% : N/A)
264 + 0 with itself and mate mapped
78 + 0 singletons (22.81% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
(viral-ngs4) jhuang@WS-2290C:~/DATA/Data_Nicole_CaptureProbeSequencing/data/02_align_to_self$ samtools flagstat HSV-Klinik_S2.mapped.bam
162156 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
162156 + 0 mapped (100.00% : N/A)
162156 + 0 paired in sequencing
81048 + 0 read1
81108 + 0 read2
161068 + 0 properly paired (99.33% : N/A)
161630 + 0 with itself and mate mapped
526 + 0 singletons (0.32% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
(viral-ngs4) jhuang@WS-2290C:~/DATA/Data_Nicole_CaptureProbeSequencing/data/01_per_sample$ samtools flagstat HSV-Klinik_S2.taxfilt.bam
800454 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
800454 + 0 paired in sequencing
400227 + 0 read1
400227 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
(viral-ngs4) jhuang@WS-2290C:~/DATA/Data_Nicole_CaptureProbeSequencing/data/02_align_to_self$ samtools flagstat HSV-Klinik_S2.bam
885528 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
191932 + 0 duplicates
354088 + 0 mapped (39.99% : N/A)
885528 + 0 paired in sequencing
442764 + 0 read1
442764 + 0 read2
323502 + 0 properly paired (36.53% : N/A)
324284 + 0 with itself and mate mapped
29804 + 0 singletons (3.37% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Summarize statistics from snakemake-output
samples-runs.txt
samtools flagstat data/02_align_to_self/838_S1.mapped.bam
samtools flagstat data/02_align_to_self/840_S2.mapped.bam
samtools flagstat data/02_align_to_self/820_S3.mapped.bam
samtools flagstat data/02_align_to_self/828_S4.mapped.bam
samtools flagstat data/02_align_to_self/815_S5.mapped.bam
samtools flagstat data/02_align_to_self/834_S6.mapped.bam
samtools flagstat data/02_align_to_self/808_S7.mapped.bam
samtools flagstat data/02_align_to_self/811_S8.mapped.bam
samtools flagstat data/02_align_to_self/837_S9.mapped.bam
samtools flagstat data/02_align_to_self/768_S10.mapped.bam
samtools flagstat data/02_align_to_self/773_S11.mapped.bam
samtools flagstat data/02_align_to_self/767_S12.mapped.bam
samtools flagstat data/02_align_to_self/810_S13.mapped.bam
samtools flagstat data/02_align_to_self/814_S14.mapped.bam
samtools flagstat data/02_align_to_self/10121-16_S15.mapped.bam --> 3c
Origin of hepatitis C virus genotype 3 in Africa as estimated
through an evolutionary analysis of the full-length genomes of nine
subtypes, including the newly sequenced 3d and 3e
samtools flagstat data/02_align_to_self/7510-15_S16.mapped.bam -->
samtools flagstat data/02_align_to_self/828-17_S17.mapped.bam
samtools flagstat data/02_align_to_self/8806-15_S18.mapped.bam
samtools flagstat data/02_align_to_self/9881-16_S19.mapped.bam
samtools flagstat data/02_align_to_self/8981-14_S20.mapped.bam
Consensus sequences of each and of all isolates
cp data/02_assembly/*.fasta ./
for sample in 838_S1 840_S2 820_S3 828_S4 815_S5 834_S6 808_S7 811_S8 837_S9 768_S10 773_S11 767_S12 810_S13 814_S14 10121-16_S15 7510-15_S16 828-17_S17 8806-15_S18 9881-16_S19 8981-14_S20; do
for sample in p953-84660-tsek p938-16972-nra p942-88507-nra p943-98523-nra p944-103323-nra p947-105565-nra p948-112830-nra; do \
mv ${sample}.fasta ${sample}.fa
cat all.fa ${sample}.fa >> all.fa
done
cat RSV_dedup.fa all.fa > RSV_all.fa
mafft --adjustdirection RSV_all.fa > RSV_all.aln
snp-sites RSV_all.aln -o RSV_all_.aln
Finding the next strain with Phylogenetics: send both HCV231_all.png and HCV231_all.pdf to the Nicole
#1, generate tree
cat SARS-CoV-2_len25000_w60_newheader.fa ~/rtpd_files/2029-AW_S5/idba_ud_assembly/gapped_contig.fa > CoV2_all.fa
mafft --adjustdirection CoV2_all.fa > CoV2_all.aln
snp-sites CoV2_all.aln -o CoV2_all_.aln
fasttree -gtr -nt RSV_all_.aln > RSV_all.tree
fasttree -gtr -nt Ortho_SNP_matrix_RAxML.fa > Ortho_SNP_matrix_RAxML.tree
raxml-ng --all --model GTR+G+ASC_LEWIS --prefix CoV2_all_raxml.aln --threads 1 --msa CoV2_all_.aln --bs-trees 1000 --redo
#raxml-ng --all --model GTR+G+ASC_LEWIS --prefix raxml-ng/snippy.core.aln --threads 1 --msa variants/snippy.core.aln --bs-trees 1000 --redo
#2, open tree on Dendroscope, from phylogenetic tree, get genotype-refs as follows,
1a: S10, S11, 814_S14(3-->1a?), S18 --> 1a_EF407457
1b: S12 --> 1b_M58335
2a: 815_S5(3-->2a?) --> 2a_D00944
2c: S20 --> 2c_D50409
3a: S3, S7, S8, S13, S15, S16, S19 --> 3c_KY620605
4d: S1, S2, S9 --> 4d_EU392172
4k: S4, S6 --> 4k_EU392173
--> KX249682.1
--> KX765935.1
--> KM517573.1
cd data/02_assembly/
cat p2.fasta p3e.fasta p4e.fasta p5e.fasta > all.fasta
sed -i -e 's/-1//g' all.fasta
#sed -i -e 's/e-1//g' all.fasta
mafft --adjustdirection --clustalout all.fasta > all.aln
# MANUALLY CORRECTION!
##POLISH the assembled contigs
#for sample in p953 p938 p942 p943 p944 p947 p948 p955 p954 p952 p951 p946 p945 p940; do
# rm ${sample}_polished.fa
# #seqtk sample ../../trimmed/${sample}_R1.fastq.gz 0.1 > ${sample}_0.1_R1.fastq
# #seqtk sample ../../trimmed/${sample}_R2.fastq.gz 0.1 > ${sample}_0.1_R2.fastq
# polish_viral_ref.sh -1 ../../trimmed/${sample}_R1.fastq.gz -2 ../../trimmed/${sample}_R2.fastq.gz -r ${sample}.fasta -o ${sample}_polished.fa -t 6
#done
for sample in p946 p954 p952 p948 p945 p947 p955 p943 p951 p942; do #all.aln
for sample in p944 p938 p953 p940; do #all2.aln
for sample in p2 p3 p4 p5; do
grep "${sample}" all.aln > REF${sample}.fasta
#cut -f2-2 -d$'\t' REF${sample}.fasta > REF${sample}.fast
sed -i -e "s/${sample} //g" REF${sample}.fasta
sed -i -e "s/${sample}-1 //g" REF${sample}.fasta
sed -i -e 's/-//g' REF${sample}.fasta
echo ">REF${sample}" > REF${sample}.header
cat REF${sample}.header REF${sample}.fasta > REF${sample}.fas
seqkit seq -u REF${sample}.fas -o REF${sample}.fa
cp REF${sample}.fa ${sample}.fa
mv REF${sample}.fa ../..
sed -i -e "s/REF//g" ${sample}.fa #still under data/02_assembly/
done
#ReferenceSeeker determines closely related reference genomes
#https://github.com/oschwengers/referenceseeker
(referenceseeker) jhuang@hamburg:~/DATA/Data_Holger_Efaecium$ ~/Tools/referenceseeker/bin/referenceseeker -v ~/REFs/bacteria-refseq/ shovill/noAB_wildtype/contigs.fasta
# Annotating the fasta using VAPiD
makeblastdb -in *.fasta -dbtype nucl
python ~/Tools/VAPiD/vapid3.py --db ~/REFs/all_virus/all_virus.fasta p946R.fa ~/REFs/template_Less.sbt
python ~/Tools/VAPiD/vapid3.py --db ~/REFs/all_virus/all_virus.fasta REFp944.fa ~/REFs/template_Less.sbt # KT581445.1 selected!
python ~/Tools/VAPiD/vapid3.py --db ~/REFs/all_virus/all_virus.fasta contigs_final.fasta ~/REFs/template_Amir.sbt
python ~/Tools/VAPiD/vapid3.py --online contigs_final.fasta ~/REFs/template_Amir.sbt
All packages under the viral-ngs4 env, note that novoalign is not installed. The used Novoalign path: /home/jhuang/Tools/novocraft_v3/novoalign; the used gatk: /usr/local/bin/gatk using /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar (see the point 9).
jhuang@WS-2290C:~$ conda activate viral-ngs4
(viral-ngs4) jhuang@WS-2290C:~$ conda list
# packages in environment at /home/jhuang/miniconda3/envs/viral-ngs4:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_r-mutex 1.0.1 anacondar_1 conda-forge
alsa-lib 1.2.3.2 h166bdaf_0 conda-forge
bamtools 2.5.2 hdcf5f25_5 bioconda
bedtools 2.31.1 hf5e1c6e_2 bioconda
binutils_impl_linux-64 2.43 h4bf12b8_2 conda-forge
binutils_linux-64 2.43 h4852527_2 conda-forge
biopython 1.79 py36h8f6f2f9_0 conda-forge
blast 2.6.0 boost1.64_2 bioconda
bmfilter 3.101 h4ac6f70_5 bioconda
bmtagger 3.101 h470a237_4 bioconda
bmtool 3.101 hdbdd923_5 bioconda
boost 1.64.0 py36_4 conda-forge
boost-cpp 1.64.0 1 conda-forge
bowtie 1.3.1 py36h769816f_3 bioconda
bowtie2 2.5.4 h7071971_4 bioconda
bwa 0.7.18 he4a0461_1 bioconda
bwidget 1.9.14 ha770c72_1 conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.34.2 heb4867d_0 conda-forge
ca-certificates 2024.9.24 h06a4308_0
cairo 1.16.0 h18b612c_1001 conda-forge
cd-hit 4.8.1 h43eeafb_10 bioconda
cd-hit-auxtools 4.8.1 h4ac6f70_3 bioconda
certifi 2021.5.30 py36h5fab9bb_0 conda-forge
curl 7.68.0 hf8cf82a_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 hfdff14a_1 conda-forge
diamond 2.1.10 h43eeafb_2 bioconda
expat 2.6.4 h5888daf_0 conda-forge
extract_fullseq 3.101 h4ac6f70_5 bioconda
fastqc 0.12.1 hdfd78af_0 bioconda
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
fontconfig 2.14.1 hef1e5e3_0
freetype 2.12.1 h267a509_2 conda-forge
fribidi 1.0.10 h36c2ea0_0 conda-forge
future 0.18.2 py36h5fab9bb_3 conda-forge
gap2seq 2.1 boost1.64_1 bioconda
gatk 3.6 hdfd78af_11 bioconda
gcc_impl_linux-64 14.2.0 h6b349bd_1 conda-forge
gcc_linux-64 14.2.0 h5910c8f_5 conda-forge
gettext 0.22.5 he02047a_3 conda-forge
gettext-tools 0.22.5 he02047a_3 conda-forge
gfortran_impl_linux-64 14.2.0 hc73f493_1 conda-forge
gfortran_linux-64 14.2.0 hda50785_5 conda-forge
giflib 5.2.2 hd590300_0 conda-forge
glib 2.66.3 h58526e2_0 conda-forge
graphite2 1.3.13 h59595ed_1003 conda-forge
gsl 2.4 h294904e_1006 conda-forge
gst-plugins-base 1.14.5 h0935bb2_2 conda-forge
gstreamer 1.14.5 h36ae1b5_2 conda-forge
gxx_impl_linux-64 14.2.0 h2c03514_1 conda-forge
gxx_linux-64 14.2.0 h9423afd_5 conda-forge
harfbuzz 2.4.0 h37c48d4_1 conda-forge
icu 58.2 hf484d3e_1000 conda-forge
jpeg 9e h0b41bf4_3 conda-forge
kernel-headers_linux-64 3.10.0 he073ed8_18 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.3.1 py36h605e78d_1 conda-forge
kmer-jellyfish 2.3.1 h4ac6f70_2 bioconda
krb5 1.16.4 h2fd8d38_0 conda-forge
last 876 py36_0 bioconda
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.43 h712a8e2_2 conda-forge
libasprintf 0.22.5 he8f35ee_3 conda-forge
libasprintf-devel 0.22.5 he8f35ee_3 conda-forge
libblas 3.9.0 25_linux64_openblas conda-forge
libcblas 3.9.0 25_linux64_openblas conda-forge
libcurl 7.68.0 hda55be3_0 conda-forge
libdeflate 1.21 h4bc722e_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.4 h5888daf_0 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc 14.2.0 h77fa898_1 conda-forge
libgcc-devel_linux-64 14.2.0 h41c2201_101 conda-forge
libgcc-ng 14.2.0 h69a702a_1 conda-forge
libgettextpo 0.22.5 he02047a_3 conda-forge
libgettextpo-devel 0.22.5 he02047a_3 conda-forge
libgfortran-ng 7.5.0 h14aa051_20 conda-forge
libgfortran4 7.5.0 h14aa051_20 conda-forge
libgfortran5 14.2.0 hd5240d6_1 conda-forge
libglib 2.66.3 hbe7bbb4_0 conda-forge
libgomp 14.2.0 h77fa898_1 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libidn11 1.33 h7b6447c_0
liblapack 3.9.0 25_linux64_openblas conda-forge
libnghttp2 1.51.0 hdcd2b5c_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.28 pthreads_h94d23a6_0 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libsanitizer 14.2.0 h2a3dede_1 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.10.0 haa6b8db_3 conda-forge
libstdcxx 14.2.0 hc0a3c3a_1 conda-forge
libstdcxx-devel_linux-64 14.2.0 h41c2201_101 conda-forge
libstdcxx-ng 14.2.0 h4852527_1 conda-forge
libtiff 4.2.0 hf544144_3 conda-forge
libuuid 1.0.3 h7f8727e_2
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.17.0 h8a09558_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.9.14 h74e7548_0
libzlib 1.2.13 h4ab18f5_6 conda-forge
llvm-openmp 8.0.1 hc9558a2_0 conda-forge
mafft 7.221 0 bioconda
make 4.4.1 hb9d3cd8_2 conda-forge
matplotlib 3.3.4 py36h5fab9bb_0 conda-forge
matplotlib-base 3.3.4 py36hd391965_0 conda-forge
mummer4 4.0.0rc1 pl5321hdbdd923_7 bioconda
muscle 3.8.1551 h7d875b9_6 bioconda
mvicuna 1.0 h4ac6f70_10 bioconda
ncurses 6.5 he02047a_1 conda-forge
numpy 1.19.5 py36hfc0c790_2 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjdk 8.0.412 hd590300_1 conda-forge
openjpeg 2.4.0 hb52868f_1 conda-forge
openmp 8.0.1 0 conda-forge
openssl 1.1.1w hd590300_0 conda-forge
pandas 1.1.5 py36h284efc9_0 conda-forge
pango 1.42.4 h7062337_4 conda-forge
parallel 20240922 ha770c72_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
perl 5.32.1 7_hd590300_perl5 conda-forge
picard 3.0.0 hdfd78af_0 bioconda
pigz 2.6 h27cfd23_0
pillow 8.2.0 py36ha6010c0_1 conda-forge
pip 21.3.1 pyhd8ed1ab_0 conda-forge
pixman 0.38.0 h516909a_1003 conda-forge
prinseq 0.20.4 hdfd78af_5 bioconda
pthread-stubs 0.4 hb9d3cd8_1002 conda-forge
pybedtools 0.9.0 py36h7281c5b_1 bioconda
pyparsing 3.1.4 pyhd8ed1ab_0 conda-forge
pyqt 5.9.2 py36hcca6a23_4 conda-forge
pysam 0.16.0 py36h873a209_0 bioconda
python 3.6.7 h381d211_1004 conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.6 2_cp36m conda-forge
pytz 2023.3.post1 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py36h8f6f2f9_1 conda-forge
qt 5.9.7 h52cfd70_2 conda-forge
r-assertthat 0.2.1 r36h6115d3f_2 conda-forge
r-backports 1.2.1 r36hcfec24a_0 conda-forge
r-base 3.6.1 h9bb98a2_1
r-bitops 1.0_7 r36hcfec24a_0 conda-forge
r-brio 1.1.2 r36hcfec24a_0 conda-forge
r-callr 3.7.0 r36hc72bb7e_0 conda-forge
r-catools 1.18.2 r36h03ef668_0 conda-forge
r-cli 2.5.0 r36hc72bb7e_0 conda-forge
r-colorspace 2.0_1 r36hcfec24a_0 conda-forge
r-crayon 1.4.1 r36hc72bb7e_0 conda-forge
r-desc 1.3.0 r36hc72bb7e_0 conda-forge
r-diffobj 0.3.4 r36hcfec24a_0 conda-forge
r-digest 0.6.27 r36h03ef668_0 conda-forge
r-ellipsis 0.3.2 r36hcfec24a_0 conda-forge
r-evaluate 0.14 r36h6115d3f_2 conda-forge
r-fansi 0.4.2 r36hcfec24a_0 conda-forge
r-farver 2.1.0 r36h03ef668_0 conda-forge
r-ggplot2 3.3.3 r36hc72bb7e_0 conda-forge
r-glue 1.4.2 r36hcfec24a_0 conda-forge
r-gplots 3.1.1 r36hc72bb7e_0 conda-forge
r-gsalib 2.1 r36_1002 conda-forge
r-gtable 0.3.0 r36h6115d3f_3 conda-forge
r-gtools 3.8.2 r36hcdcec82_1 conda-forge
r-isoband 0.2.4 r36h03ef668_0 conda-forge
r-jsonlite 1.7.2 r36hcfec24a_0 conda-forge
r-kernsmooth 2.23_20 r36h742201e_0 conda-forge
r-labeling 0.4.2 r36h142f84f_0 conda-forge
r-lattice 0.20_44 r36hcfec24a_0 conda-forge
r-lifecycle 1.0.0 r36hc72bb7e_0 conda-forge
r-magrittr 2.0.1 r36hcfec24a_1 conda-forge
r-mass 7.3_54 r36hcfec24a_0 conda-forge
r-matrix 1.3_3 r36he454529_0 conda-forge
r-mgcv 1.8_35 r36he454529_0 conda-forge
r-munsell 0.5.0 r36h6115d3f_1003 conda-forge
r-nlme 3.1_152 r36h859d828_0 conda-forge
r-pillar 1.6.1 r36hc72bb7e_0 conda-forge
r-pkgconfig 2.0.3 r36h6115d3f_1 conda-forge
r-pkgload 1.2.1 r36h03ef668_0 conda-forge
r-plyr 1.8.6 r36h0357c0b_1 conda-forge
r-praise 1.0.0 r36h6115d3f_1004 conda-forge
r-processx 3.5.2 r36hcfec24a_0 conda-forge
r-ps 1.6.0 r36hcfec24a_0 conda-forge
r-r6 2.5.0 r36hc72bb7e_0 conda-forge
r-rcolorbrewer 1.1_2 r36h6115d3f_1003 conda-forge
r-rcpp 1.0.6 r36h03ef668_0 conda-forge
r-rematch2 2.1.2 r36h6115d3f_1 conda-forge
r-reshape 0.8.8 r36hcdcec82_2 conda-forge
r-rlang 0.4.11 r36hcfec24a_0 conda-forge
r-rprojroot 2.0.2 r36hc72bb7e_0 conda-forge
r-rstudioapi 0.13 r36hc72bb7e_0 conda-forge
r-scales 1.1.1 r36h6115d3f_0 conda-forge
r-testthat 3.0.2 r36h03ef668_0 conda-forge
r-tibble 3.1.2 r36hcfec24a_0 conda-forge
r-utf8 1.2.1 r36hcfec24a_0 conda-forge
r-vctrs 0.3.8 r36hcfec24a_1 conda-forge
r-viridislite 0.4.0 r36hc72bb7e_0 conda-forge
r-waldo 0.2.5 r36hc72bb7e_0 conda-forge
r-withr 2.4.2 r36hc72bb7e_0 conda-forge
readline 7.0 hf8c457e_1001 conda-forge
salmon 0.14.2 ha0cc327_0 bioconda
samtools 1.6 h244ad75_5 bioconda
setuptools 58.0.4 py36h5fab9bb_2 conda-forge
sip 4.19.8 py36hf484d3e_1000 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
snpeff 4.1l hdfd78af_8 bioconda
spades 3.15.5 h95f258a_1 bioconda
sqlite 3.28.0 h8b20d00_0 conda-forge
srprism 2.4.24 h6a68c12_5 bioconda
sysroot_linux-64 2.17 h4a8ded7_18 conda-forge
tbb 2020.3 hfd86e86_0
tbl2asn 25.7 h9ee0642_1 bioconda
tk 8.6.13 noxft_h4845f30_101 conda-forge
tktable 2.10 h8bc8fbc_6 conda-forge
tornado 6.1 py36h8f6f2f9_1 conda-forge
trimmomatic 0.39 hdfd78af_2 bioconda
trinity 2.8.5 h8b12597_5 bioconda
tzdata 2024b hc8b5060_0 conda-forge
unzip 6.0 h611a1e1_0
vphaser2 2.0 h7a259b3_14 bioconda
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.2 h470a237_5 conda-forge
xorg-libx11 1.8.10 h4f16b4b_0 conda-forge
xorg-libxau 1.0.11 hb9d3cd8_1 conda-forge
xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge
xorg-libxext 1.3.6 hb9d3cd8_0 conda-forge
xorg-libxfixes 6.0.1 hb9d3cd8_0 conda-forge
xorg-libxi 1.8.2 hb9d3cd8_0 conda-forge
xorg-libxrender 0.9.11 hb9d3cd8_1 conda-forge
xorg-libxtst 1.2.5 hb9d3cd8_3 conda-forge
xorg-xorgproto 2024.1 hb9d3cd8_1 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
zlib 1.2.13 h4ab18f5_6 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
点赞本文的读者
还没有人对此文章表态
没有评论
Typing of 81 S. epidermidis samples (Luise)
Co-Authorship Network Generator using scraped data from Google Scholar via SerpAPI
Configuring Mutt for Gmail: A Step-by-Step Guide (Todo)
Enriching Monkeypox virus using xGen™ Monkeypox Virus Amplicon Panel
© 2023 XGenes.com Impressum