-
General Information
Title: ChIP-seq of primary human macrophages uninfected or infected with Yersinia enterocolitica strains Description: Pathogenic bacteria Yersinia enterocolitica injects virulence plasmid-encoded effectors through the type three secretion system into macrophages to modulate gene expression. At this point it is not known whether epigenetic modifications play a role in Yersinia regulation of gene expression. To answer this question primary human macrophages were infected with mock, WAC (virulence plasmid-cured strain) or WA314 (wild type) and samples were subjected to ChIP-seq for H3K4me3, H3K4me1, H3K27ac and H3K27me3. The effect of effector proteins YopM and YopP on histone modifications in macrophages was analyzed using a wild type strain lacking either YopM or YopP and subsequent ChIP-seq analysis. Experiment Type: 4C, antigen profiling, ATAC-seq, Bisulfite-seq, Capture-C, ChIP-chip by array, ChIP-chip by SNP array, ChIP-chip by tiling array, ChIP-seq*, CLIP-seq, comparative genomic hybridization by array, CUT&RUN, DNA-seq, exome sequencing, FAIRE-seq, genotyping by array, genotyping by high throughput sequencing, GRO-seq, Hi-C, MBD-seq, MeDIP-seq, methylation profiling by array, methylation profiling by high throughput sequencing, microRNA profiling by array, microRNA profiling by high throughput sequencing, MNase-seq, MRE-seq, proteomic profiling by array, Ribo-seq, RIP-chip by array, RIP-seq, RNA-seq of coding RNA, RNA-seq of coding RNA from single cells, RNA-seq of non coding RNA, RNA-seq of non coding RNA from single cells, RNA-seq of total RNA, scATAC-seq, single nucleus RNA sequencing, spatial transcriptomics by high-throughput sequencing, tiling path by array, transcription profiling by array, transcription profiling by RT-PCR Experimental Designs: cell type comparison design; stimulus or stress design 2019-01-01
-
Contacts
xx x.x@uke.de Institute of Medical Microbiology, Virology and Hygiene, University Medical Center Hamburg-Eppendorf (UKE) Martinistraße 52, 20246 Hamburg, Germany data analyst experiment performer submitter yy y.y@uke.de investigator
-
Publications
-
Create samples, add attributes and experimental variables
#Header: Name Organism CellType Stimulus MateiralType Immunoprecipitate Sample 1 Homo sapiens macrophage Y. enterocolitica WA314deltaYopM DNA H3K27ac Sample 2 Homo sapiens macrophage Y. enterocolitica WA314deltaYopM DNA H3K27ac Sample 3 Homo sapiens macrophage Y. enterocolitica WA314deltaYopP DNA H3K27ac Sample 4 Homo sapiens macrophage Y. enterocolitica WA314deltaYopP DNA H3K27ac Sample 5 Homo sapiens macrophage mock DNA H3K27ac Sample 6 Homo sapiens macrophage mock DNA H3K27ac Sample 7 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K27ac Sample 8 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K27ac Sample 9 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K27ac Sample 10 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K27ac Sample 11 Homo sapiens macrophage mock DNA H3K27me3 Sample 12 Homo sapiens macrophage mock DNA H3K27me3 Sample 13 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K27me3 Sample 14 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K27me3 Sample 15 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K27me3 Sample 16 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K27me3 Sample 17 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K27me3 Sample 18 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K27me3 Sample 19 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K27me3 Sample 20 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K27me3 Sample 21 Homo sapiens macrophage mock DNA H3K4me1 Sample 22 Homo sapiens macrophage mock DNA H3K4me1 Sample 23 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me1 Sample 24 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me1 Sample 25 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me1 Sample 26 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me1 Sample 27 Homo sapiens macrophage mock DNA H3K4me3 Sample 28 Homo sapiens macrophage mock DNA H3K4me3 Sample 29 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me3 Sample 30 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me3 Sample 31 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me3 Sample 32 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me3 Sample 33 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me3 Sample 34 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me3 Sample 35 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me3 Sample 36 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me3 Sample 37 Homo sapiens macrophage Y. enterocolitica WA314deltaYopM DNA H3K4me3 Sample 38 Homo sapiens macrophage Y. enterocolitica WA314deltaYopM DNA H3K4me3 Sample 39 Homo sapiens macrophage Y. enterocolitica WA314deltaYopP DNA H3K4me3 Sample 40 Homo sapiens macrophage Y. enterocolitica WA314deltaYopP DNA H3K4me3 Sample 41 Homo sapiens macrophage mock DNA H3K4me3 Sample 42 Homo sapiens macrophage mock DNA H3K4me3 Sample 43 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me3 Sample 44 Homo sapiens macrophage Y. enterocolitica WAC DNA H3K4me3 Sample 45 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me3 Sample 46 Homo sapiens macrophage Y. enterocolitica WA314 DNA H3K4me3
-
Assign ENA library information
#Header: NameResize Library Layout * Library Source * Library Strategy * Library Selection * Library Strand Nominal Length Nominal SDev Orientation Sample 1 SINGLE GENOMIC (Genomic DNA (includes PCR products from genomic DNA)) ChIP-Seq (Direct sequencing of chromatin immunoprecipitates) ChIP (Chromatin immunoprecipitation) Sample 2 SINGLE GENOMIC (Genomic DNA (includes PCR products from genomic DNA)) ChIP-Seq (Direct sequencing of chromatin immunoprecipitates) ChIP (Chromatin immunoprecipitation)
-
Describe protocols
#Header: Name Assign protocols to samples Type Description * Performer ** Hardware ** Software Protocol 1 Assign... Type: sample collection protocol Description: Human peripheral blood monocytes were isolated by centrifugation of heparinized blood in Ficoll. Monocytic cells were isolated with magnetic anti-CD14 antibody beads and an MS+ Separation Column (Miltenyi Biotec) according to the manufacturer’s instructions and seeded into 6-well plates at a density of 2 × 106 cells. Cells were cultured in RPMI containing 20% autologous serum at 37°C, 5% CO2, and 90% humidity. The medium was changed every three days until cells were differentiated into macrophages after 7 days. Macrophages were used for infection 1-2 weeks after the isolation Protocol 2 Assign... nucleic acid extraction protocol For the ChIP with formaldehyde crosslinking, macrophages (~3-10 x 106 cells per condition) were washed once with warm PBS and incubated for 30 min at 37 °C with accutase (eBioscience, USA) to detach the cells. ChIP protocol steps were performed as described in (Günther et al., 2016, PMID: 26855283), except that BSA-blocked ChIP grade protein A/G magnetic beads (Thermo Fisher Scientific, USA) were added to the chromatin and antibody mixture and incubated for 2 h at 4 °C rotating to bind chromatin-antibody complexes. Samples were incubated for ~3 min with a magnetic stand to ensure attachment of beads to the magnet and mixed by pipetting during the wash steps. Protocol 3 Assign... nucleic acid library construction protocol ChIP-seq libraries were constructed with 1-10 ng of ChIP DNA or input control as a starting material. Libraries were generated using the NEXTflex™ ChIP-Seq Kit (Bioo Scientific, USA) as per manufacturer’s recommendations. Concentrations of all samples were measured with a Qubit Fluorometer (Thermo Fisher Scientific, USA) and fragment length distribution of the final libraries was analysed with the DNA High Sensitivity Chip on an Agilent 2100 Bioanalyzer (Agilent Technologies, USA). Protocol 4 Assign... nucleic acid sequencing protocol All samples were normalized to 2 nM and pooled equimolar. The library pool was sequenced on the NextSeq500 (Illumina, USA) with 1 x 75 bp and total at least ~18 million reads per sample. Technology Platform Next Generation Sequencing Heinrich Pette Institute, Leibniz Institute for Experimental Virology Martinistraße 52 20251 Hamburg NextSeq 500
-
Assign data files
#Header: NameResize Raw Data File Sample 1 H3K27ac_dM_DoA.fastq.gz Sample 2 H3K27ac_dM_DoB.fastq.gz Sample 3 H3K27ac_dP_DoA.fastq.gz Sample 4 H3K27ac_dP_DoB.fastq.gz Sample 5 H3K27ac_mock_DoA.fastq.gz ...
Author Archives: gene_x
Variant calling for Data_Huang_Human_herpesvirus_3 using snippy+spandx+viralngs
-
Input files
mkdir raw_data; cd raw_data; # Note that the names must be ending with fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb180/01_VZV_20S_S1_R1_001.fastq.gz VZV_20S_R1.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb180/01_VZV_20S_S1_R2_001.fastq.gz VZV_20S_R2.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb181/02_VZV_20c_S2_R1_001.fastq.gz VZV_20c_R1.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb181/02_VZV_20c_S2_R2_001.fastq.gz VZV_20c_R2.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb182/03_VZV_60S_S3_R1_001.fastq.gz VZV_60S_R1.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb182/03_VZV_60S_S3_R2_001.fastq.gz VZV_60S_R2.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb183/04_VZV_60c_S4_R1_001.fastq.gz VZV_60c_R1.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb183/04_VZV_60c_S4_R2_001.fastq.gz VZV_60c_R2.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb184/05_VZV_1451S_S5_R1_001.fastq.gz VZV_1451S_R1.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb184/05_VZV_1451S_S5_R2_001.fastq.gz VZV_1451S_R2.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb185/06_Pcc1_1451_S6_R1_001.fastq.gz Pcc1_1451_R1.fastq.gz ln -s ../VZV/241121_VH00358_117_AAGFF7FM5_Dongdong/wb185/06_Pcc1_1451_S6_R2_001.fastq.gz Pcc1_1451_R2.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb190/PCC1_VZV_20_1_S36_R1_001.fastq.gz PCC1_VZV_20_1_R1.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb190/PCC1_VZV_20_1_S36_R2_001.fastq.gz PCC1_VZV_20_1_R2.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb191/PCC1_VZV_20_2_S37_R1_001.fastq.gz PCC1_VZV_20_2_R1.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb191/PCC1_VZV_20_2_S37_R2_001.fastq.gz PCC1_VZV_20_2_R2.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb192/PCC1_VZV_20_5_S38_R1_001.fastq.gz PCC1_VZV_20_5_R1.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb192/PCC1_VZV_20_5_S38_R2_001.fastq.gz PCC1_VZV_20_5_R2.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb193/PCC1_VZV_60_1_S39_R1_001.fastq.gz PCC1_VZV_60_1_R1.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb193/PCC1_VZV_60_1_S39_R2_001.fastq.gz PCC1_VZV_60_1_R2.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb194/PCC1_VZV_60_4_S40_R1_001.fastq.gz PCC1_VZV_60_4_R1.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb194/PCC1_VZV_60_4_S40_R2_001.fastq.gz PCC1_VZV_60_4_R2.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb195/PCC1_VZV_60_6_S41_R1_001.fastq.gz PCC1_VZV_60_6_R1.fastq.gz ln -s ../VZV/2024_081_wb_dongdong/wb195/PCC1_VZV_60_6_S41_R2_001.fastq.gz PCC1_VZV_60_6_R2.fastq.gz
-
Call variant calling using snippy
ln -s ~/Tools/bacto/db/ .; ln -s ~/Tools/bacto/envs/ .; ln -s ~/Tools/bacto/local/ .; cp ~/Tools/bacto/Snakefile .; cp ~/Tools/bacto/bacto-0.1.json .; cp ~/Tools/bacto/cluster.json .; #download CU459141.gb from GenBank mv ~/Downloads/sequence\(1\).gb db/NC_001348.gb = X04370.1 mv ~/Downloads/sequence\(2\).gb db/AB097932.gb #X04370 #setting the following in bacto-0.1.json "fastqc": false, "taxonomic_classifier": false, "assembly": true, "typing_ariba": false, "typing_mlst": true, "pangenome": true, "variants_calling": true, "phylogeny_fasttree": true, "phylogeny_raxml": true, "recombination": false, (due to gubbins-error set false) "genus": "Varicella-zoster virus", "kingdom": "Viruses", "species": "Varicella-zoster virus"(in both prokka and mykrobe) "reference": "db/NC_001348.gb" conda activate bengal3_ac3 (bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
-
Summarize all SNPs and Indels from the snippy result directory.
#Output: snippy/summary_snps_indels.csv # IMPORTANT_ADAPT the array isolates = ["AYE-S", "AYE-Q", "AYE-WT on Tig4", "AYE-craA on Tig4", "AYE-craA-1 on Cm200", "AYE-craA-2 on Cm200"] python3 ~/Scripts/summarize_snippy_res.py snippy cd snippy grep -v "None,,,,,,None,None" summary_snps_indels.csv > summary_snps_indels_.csv
-
Using spandx calling variants (almost the same results to the one from viral-ngs!)
mamba activate /home/jhuang/miniconda3/envs/spandx mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/NC_001348 cp NC_001348.gb ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/NC_001348/genes.gbk vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config /home/jhuang/miniconda3/envs/spandx/bin/snpEff build NC_001348 #-d ~/Scripts/genbank2fasta.py NC_001348.gb mv NC_001348.gb_converted.fna NC_001348.fasta #rename "NC_001348.1 xxxxx" to "NC_001348" in the fasta-file ln -s /home/jhuang/Tools/spandx/ spandx (spandx) nextflow run spandx/main.nf --fastq "trimmed/*_P_{1,2}.fastq" --ref NC_001348.fasta --annotation --database NC_001348 -resume # Rerun SNP_matrix.sh due to the error ERROR_CHROMOSOME_NOT_FOUND in the variants annotation cd Outputs/Master_vcf (spandx) cp -r ../../snippy/VZV_20S/reference . (spandx) cp ../../spandx/bin/SNP_matrix.sh ./ #Note that ${variant_genome_path}=NC_001348 in the following command, but it was not used after command replacement. #Adapt "snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -v ${variant_genome_path} out.vcf > out.annotated.vcf" to "/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -c reference/snpeff.config -dataDir . ref out.vcf > out.annotated.vcf" in SNP_matrix.sh (spandx) bash SNP_matrix.sh NC_001348 .
-
Calling inter-host variants by merging the results from snippy+spandx (Manually!)
# Inter-host variants(宿主间变异):一种病毒在两个人之间有不同的基因变异,这些变异可能与宿主的免疫反应、疾病表现或病毒传播的方式相关。 cp All_SNPs_indels_annotated.txt All_SNPs_indels_annotated_backup.txt vim All_SNPs_indels_annotated.txt
-
Calling intra-host variants using viral-ngs (http://xgenes.com/article/article-content/347/variant-calling-for-herpes-simplex-virus-1-from-patient-sample-using-capture-probe-sequencing/)
# Intra-host variants(宿主内变异):同一个人感染了某种病毒,但在其体内的不同细胞或器官中可能存在多个不同的病毒变异株。 mamba activate /home/jhuang/miniconda3/envs/viral-ngs4 mkdir viralngs ln -s ~/Tools/viral-ngs/Snakefile Snakefile ln -s ~/Tools/viral-ngs/bin bin cp ~/Tools/viral-ngs/refsel.acids refsel.acids cp ~/Tools/viral-ngs/lastal.acids lastal.acids cp ~/Tools/viral-ngs/config.yaml config.yaml cp ~/Tools/viral-ngs/samples-runs.txt samples-runs.txt cp ~/Tools/viral-ngs/samples-depletion.txt samples-depletion.txt cp ~/Tools/viral-ngs/samples-metagenomics.txt samples-metagenomics.txt cp ~/Tools/viral-ngs/samples-assembly.txt samples-assembly.txt cp ~/Tools/viral-ngs/samples-assembly-failures.txt samples-assembly-failures.txt mkdir data cd data mkdir 00_raw cd ../.. mkdir bams ref_fa="NC_001348.fasta"; for sample in VZV_20S VZV_20c VZV_60S VZV_60c PCC1_VZV_20_1 PCC1_VZV_20_2 PCC1_VZV_20_5 PCC1_VZV_60_1 PCC1_VZV_60_4 PCC1_VZV_60_6; do bwa index ${ref_fa}; \ bwa mem -M -t 16 ${ref_fa} trimmed/${sample}_trimmed_P_1.fastq trimmed/${sample}_trimmed_P_2.fastq | samtools view -bS - > bams/${sample}_genome_alignment.bam; \ done for sample in VZV_20S VZV_20c VZV_60S VZV_60c PCC1_VZV_20_1 PCC1_VZV_20_2 PCC1_VZV_20_5 PCC1_VZV_60_1 PCC1_VZV_60_4 PCC1_VZV_60_6; do picard AddOrReplaceReadGroups I=bams/${sample}_genome_alignment.bam O=viralngs/data/00_raw/${sample}.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=$sample RGSM=$sample RGLB=standard RGPU=$sample VALIDATION_STRINGENCY=LENIENT; \ done cd viralngs (viral-ngs4) snakemake --printshellcmds --cores 80 # -- DEBUG: If the env disappeared, reinstall the env viral-ngs4 -- # -- Running time hints -- #Note that novoalign is not installed. The used Novoalign path: /home/jhuang/Tools/novocraft_v3/novoalign; the used gatk: /usr/local/bin/gatk using /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar. #Samtools path: #Why, the samtools in the env is v1.6? #Novoalign path: /home/jhuang/Tools/novocraft_v3/novoalign #GATK path: /usr/local/bin/gatk # jar_file in the file: jar_file = '/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar' # -- in config.yaml -- #GATK_PATH: "/home/jhuang/Tools/GenomeAnalysisTK-3.6" #NOVOALIGN_PATH: "/home/jhuang/Tools/novocraft_v3" mamba create -n viral-ngs4 python=3.6 mamba activate viral-ngs4 mamba install blast=2.6.0 bmtagger biopython pysam pyyaml picard mvicuna pybedtools fastqc matplotlib spades last=876 -c conda-forge -c bioconda #mafft=7.221 --> mafft since └─ mafft 7.221** is not installable because it conflicts with any installable versions previously reported. mamba install cd-hit cd-hit-auxtools diamond gap2seq=2.1 mafft mummer4 muscle=3.8 parallel pigz prinseq samtools=1.6 tbl2asn trimmomatic trinity unzip vphaser2 bedtools -c r -c defaults -c conda-forge -c bioconda mamba install bwa mamba install vphaser2=2.0 # Sovle confilict between bowtie, bowtie2 and snpeff mamba remove bowtie mamba install bowtie2 mamba remove snpeff mamba install snpeff=4.1l #which snpEff mamba install gatk=3.6 #DEBUG if FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/gatk': '/usr/local/bin/gatk' #IMPORTANT_UPDATE jar_file in the file /home/jhuang/mambaforge/envs/viral-ngs4/bin/gatk3 with "/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar" #IMPORTANT_REPLACE "sudo cp /home/jhuang/mambaforge/envs/viral-ngs4/bin/gatk3 /usr/local/bin/gatk" #IMPORTANT_SET /home/jhuang/Tools/GenomeAnalysisTK-3.6 as GATK_PATH in config.yaml #IMPORTANT_CHECK if it works # java -jar /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator --help # /usr/local/bin/gatk -T RealignerTargetCreator --help #IMPORTANT_NOTE that the env viral-ngs4 cannot logined from the base env due to the python3-conflict!
-
Merge intra- and inter-host variants, comparing the variants to the alignments of the assemblies to confirm its correctness.
cat NC_001348.fasta viralngs/data/02_assembly/VZV_20S.fasta viralngs/data/02_assembly/VZV_60S.fasta > aligned_1.fasta mafft --clustalout aligned_1.fasta > aligned_1.aln #~/Scripts/convert_fasta_to_clustal.py aligned_1.fasta_orig aligned_1.aln ~/Scripts/convert_clustal_to_clustal.py aligned_1.aln aligned_1_.aln #manully delete the postion with all or '-' in aligned_1_.aln ~/Scripts/check_sequence_differences.py aligned_1_.aln ~/Scripts/check_sequence_differences.py aligned_1_.aln > aligned_1.res grep -v " = n" aligned_1.res > aligned_1_.res cat NC_001348.fasta viralngs/tmp/02_assembly/VZV_20S.assembly4-refined.fasta viralngs/tmp/02_assembly/VZV_60S.assembly4-refined.fasta > aligned_1.fasta mafft --clustalout aligned_1.fasta > aligned_1.aln ~/Scripts/convert_clustal_to_clustal.py aligned_1.aln aligned_1_.aln ~/Scripts/check_sequence_differences.py aligned_1_.aln > aligned_1.res grep -v " = n" aligned_1.res > aligned_1_.res #Differences found at the following positions (150): Position 8956: OP297860.1 = A, HSV1_S1-1 = A, HSV-Klinik_S2-1 = G Position 8991: OP297860.1 = A, HSV1_S1-1 = A, HSV-Klinik_S2-1 = C Position 8992: OP297860.1 = T, HSV1_S1-1 = C, HSV-Klinik_S2-1 = C Position 8995: OP297860.1 = T, HSV1_S1-1 = T, HSV-Klinik_S2-1 = C Position 9190: OP297860.1 = T, HSV1_S1-1 = A, HSV-Klinik_S2-1 = T * Position 13659: OP297860.1 = G, HSV1_S1-1 = T, HSV-Klinik_S2-1 = G * Position 47969: OP297860.1 = C, HSV1_S1-1 = T, HSV-Klinik_S2-1 = C * Position 53691: OP297860.1 = G, HSV1_S1-1 = T, HSV-Klinik_S2-1 = G * Position 55501: OP297860.1 = T, HSV1_S1-1 = C, HSV-Klinik_S2-1 = C * Position 63248: OP297860.1 = G, HSV1_S1-1 = T, HSV-Klinik_S2-1 = G Position 63799: OP297860.1 = T, HSV1_S1-1 = C, HSV-Klinik_S2-1 = T * Position 64328: OP297860.1 = C, HSV1_S1-1 = A, HSV-Klinik_S2-1 = C Position 65179: OP297860.1 = T, HSV1_S1-1 = T, HSV-Klinik_S2-1 = C * Position 65225: OP297860.1 = G, HSV1_S1-1 = G, HSV-Klinik_S2-1 = A * Position 95302: OP297860.1 = C, HSV1_S1-1 = A, HSV-Klinik_S2-1 = C gunzip isnvs.annot.txt.gz ~/Scripts/filter_isnv.py isnvs.annot.txt 0.05 cut -d$'\t' filtered_isnvs.annot.txt -f1-7 chr pos sample patient time alleles iSNV_freq OP297860 13203 HSV1_S1 HSV1_S1 T,C,A 1.0 OP297860 13203 HSV-Klinik_S2 HSV-Klinik_S2 T,C,A 1.0 OP297860 13522 HSV1_S1 HSV1_S1 G,T 1.0 OP297860 13522 HSV-Klinik_S2 HSV-Klinik_S2 G,T 0.008905554253573941 OP297860 13659 HSV1_S1 HSV1_S1 G,T 1.0 OP297860 13659 HSV-Klinik_S2 HSV-Klinik_S2 G,T 0.008383233532934131 ~/Scripts/convert_clustal_to_fasta.py aligned_1_.aln aligned_1.fasta samtools faidx aligned_1.fasta samtools faidx aligned_1.fasta OP297860.1 > OP297860.1.fasta samtools faidx aligned_1.fasta HSV1_S1-1 > HSV1_S1-1.fasta samtools faidx aligned_1.fasta HSV-Klinik_S2-1 > HSV-Klinik_S2-1.fasta seqkit seq OP297860.1.fasta -w 70 > OP297860.1_w70.fasta diff OP297860.1_w70.fasta ../../refsel_db/refsel.fasta
-
Consensus sequences of each and of all isolates
cp data/02_assembly/*.fasta ./ for sample in 838_S1 840_S2 820_S3 828_S4 815_S5 834_S6 808_S7 811_S8 837_S9 768_S10 773_S11 767_S12 810_S13 814_S14 10121-16_S15 7510-15_S16 828-17_S17 8806-15_S18 9881-16_S19 8981-14_S20; do for sample in p953-84660-tsek p938-16972-nra p942-88507-nra p943-98523-nra p944-103323-nra p947-105565-nra p948-112830-nra; do \ mv ${sample}.fasta ${sample}.fa cat all.fa ${sample}.fa >> all.fa done cat RSV_dedup.fa all.fa > RSV_all.fa mafft --adjustdirection RSV_all.fa > RSV_all.aln snp-sites RSV_all.aln -o RSV_all_.aln
-
Download all Human alphaherpesvirus 3 (Varicella-zoster virus) genomes
Human alphaherpesvirus 3 acronym: HHV-3 VZV equivalent: Human herpes virus 3 Human alphaherpesvirus 3 (Varicella-zoster virus) * Human herpesvirus 3 strain Dumas * Human herpesvirus 3 strain Oka vaccine * Human herpesvirus 3 VZV-32 #Taxonomy ID: 10335 esearch -db nucleotide -query "txid10335[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10335_ncbi.fasta python ~/Scripts/filter_fasta.py genome_10335_ncbi.fasta complete_genome_10335_ncbi.fasta #2041-->165 # ---- Download related genomes from ENA ---- https://www.ebi.ac.uk/ena/browser/view/10335 #Click "Sequence" and download "Counts" (2003) and "Taxon descendants count" (2005) if there is enough time! Downloading time points is 11.03.2025. python ~/Scripts/filter_fasta.py ena_10335_sequence.fasta complete_genome_10335_ena_taxon_descendants_count.fasta #2005-->153 #python ~/Scripts/filter_fasta.py ena_10335_sequence_Counts.fasta complete_genome_10335_ena_Counts.fasta #xxx, 5.8G
-
Run vrap
#replace --virus to the specific taxonomy (e.g. Acinetobacter baumannii) --> change virus_user_db --> specific_bacteria_user_db ln -s ~/Tools/vrap/ . mamba activate /home/jhuang/miniconda3/envs/vrap #!!!!! TODO: ignore the first parts! only take the virus genome in the vector-part! vrap/vrap.py -1 trimmed/VZV_20c_trimmed_P_1.fastq -2 trimmed/VZV_20c_trimmed_P_2.fastq -o vrap_VZV_20c --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g #-1 trimmed/VZV_20S_trimmed_P_1.fastq -2 trimmed/VZV_20S_trimmed_P_2.fastq #(vrap) vrap/vrap.py -1 trimmed/VZV_20S_trimmed_P_1.fastq -2 trimmed/VZV_20S_trimmed_P_2.fastq -o vrap_VZV_20S --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/VZV_60c_trimmed_P_1.fastq -2 trimmed/VZV_60c_trimmed_P_2.fastq -o vrap_VZV_60c --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/VZV_60S_trimmed_P_1.fastq -2 trimmed/VZV_60S_trimmed_P_2.fastq -o vrap_VZV_60S vrap/vrap.py -1 trimmed/VZV_1451S_trimmed_P_1.fastq -2 trimmed/VZV_1451S_trimmed_P_2.fastq -o vrap_VZV_1451S --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/Pcc1_1451_trimmed_P_1.fastq -2 trimmed/Pcc1_1451_trimmed_P_2.fastq -o vrap_Pcc1_1451 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/PCC1_VZV_20_1_trimmed_P_1.fastq -2 trimmed/PCC1_VZV_20_1_trimmed_P_2.fastq -o vrap_PCC1_VZV_20_1 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/PCC1_VZV_20_2_trimmed_P_1.fastq -2 trimmed/PCC1_VZV_20_2_trimmed_P_2.fastq -o vrap_PCC1_VZV_20_2 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/PCC1_VZV_20_5_trimmed_P_1.fastq -2 trimmed/PCC1_VZV_20_5_trimmed_P_2.fastq -o vrap_PCC1_VZV_20_5 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/PCC1_VZV_60_1_trimmed_P_1.fastq -2 trimmed/PCC1_VZV_60_1_trimmed_P_2.fastq -o vrap_PCC1_VZV_60_1 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/PCC1_VZV_60_4_trimmed_P_1.fastq -2 trimmed/PCC1_VZV_60_4_trimmed_P_2.fastq -o vrap_PCC1_VZV_60_4 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g vrap/vrap.py -1 trimmed/PCC1_VZV_60_6_trimmed_P_1.fastq -2 trimmed/PCC1_VZV_60_6_trimmed_P_2.fastq -o vrap_PCC1_VZV_60_6 --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/complete_genome_10335_ncbi.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g
http://xgenes.com/article/article-content/365/virus-genome-analysis-pipeline-hybrid-capture-damian-blastn-and-vrap-mapping-for-measles-ma-zhen-sample/ Draw the mapping figures on the reference, consensus reference!
-
Using the bowtie of vrap to map the reads on ref_genome/reference.fasta (The reference refers to the closest related genome found from the list generated by vrap)
(vrap) vrap/vrap.py -1 trimmed/VZV_20S_trimmed_P_1.fastq -2 trimmed/VZV_20S_trimmed_P_2.fastq -o VZV_20S_on_X04370 --host /home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/X04370.fasta -t 100 -l 200 -g cd bowtie mv mapped mapped.sam samtools view -S -b mapped.sam > mapped.bam samtools sort mapped.bam -o mapped_sorted.bam samtools index mapped_sorted.bam samtools view -H mapped_sorted.bam samtools flagstat mapped_sorted.bam
-
Show the bw on IGV
Assembly correction tools: Polca, Pilon, and Medaka
1️⃣ Polca – A lightweight polishing tool from MaSuRCA that corrects small sequencing errors using short-read data. It efficiently fixes substitutions and small INDELs but is not ideal for large structural variations.
# Under the env (nextclade)
mamba install -c bioconda -c conda-forge masurca
#-- VZV_20S.assembly3-modify.fasta --
(nextclade) polca.sh -a ../viralngs/tmp/02_assembly/VZV_20S.assembly3-modify.fasta -r "VZV_20S_trimmed_P_1.fastq VZV_20S_trimmed_P_2.fastq" -t 40 -m 10G
#3
(nextclade) polca.sh -a VZV_20S.assembly3-modify.fasta.PolcaCorrected.fa -r "VZV_20S_trimmed_P_1.fastq VZV_20S_trimmed_P_2.fastq" -t 40 -m 10G
#0
2️⃣ Pilon – A more comprehensive short-read polishing tool that corrects SNPs, small INDELs, and some structural misassemblies. It works best with high-coverage Illumina reads and can iteratively improve assembly accuracy.
bwa index PCC1_VZV_20_2.assembly3-modify.fasta.PolcaCorrected.fa
bwa mem -t 40 PCC1_VZV_20_2.assembly3-modify.fasta.PolcaCorrected.fa \
PCC1_VZV_20_2_trimmed_P_1.fastq \
PCC1_VZV_20_2_trimmed_P_2.fastq \
> aln.sam
samtools view -bS aln.sam > aln.bam
samtools sort aln.bam aln.sorted
samtools index aln.sorted.bam
(nextclade) pilon --genome PCC1_VZV_20_2.assembly3-modify.fasta.PolcaCorrected.fa \
--bam aln.sorted.bam --output polished --threads 80 --changes --fix indels
3️⃣ Medaka – A polishing tool specifically designed for Nanopore sequencing data. It uses a neural network to refine base calls and correct systematic errors in long-read assemblies.
-
Quality check using QUAST
#mamba install -c bioconda quast #quast polished.fasta -r reference.fasta -o quast_output
-
Correcting assembly for Huang_Human_herpesvirus_3
./VZV_20S.fasta ./VZV_20c.fasta ./PCC1_VZV_20_1.fasta ./PCC1_VZV_20_2.fasta ./PCC1_VZV_20_5.fasta ./VZV_60S.fasta ./VZV_60c.fasta ./PCC1_VZV_60_1.fasta ./PCC1_VZV_60_4.fasta ./PCC1_VZV_60_6.fasta #find . -nma "*.assembly1-spades.fasta" | wc -l #find . -name "*.assembly2-gapfilled.fasta" | wc -l #find . -name "*.assembly3-modify.fasta" | wc -l #find . -name "*.assembly4-refined.fasta" | wc -l # Under the env (nextclade) and directory ~/DATA/Data_Huang_Human_herpesvirus_3/trimmed mamba install -c bioconda -c conda-forge masurca #-- VZV_20S.assembly3-modify.fasta -- (nextclade) polca.sh -a ../viralngs/tmp/02_assembly/VZV_20S.assembly3-modify.fasta -r "VZV_20S_trimmed_P_1.fastq VZV_20S_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 3 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124870 #Consensus Quality Before Polishing: 99.9976 #Consensus QV Before Polishing: 46.19 (nextclade) polca.sh -a VZV_20S.assembly3-modify.fasta.PolcaCorrected.fa -r "VZV_20S_trimmed_P_1.fastq VZV_20S_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124869 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- VZV_20c.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/VZV_20c.assembly3-modify.fasta -r "VZV_20c_trimmed_P_1.fastq VZV_20c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 2 #Insertion/Deletion Errors Found: 2 #Assembly Size: 124872 #Consensus Quality Before Polishing: 99.9968 #Consensus QV Before Polishing: 44.94 polca.sh -a VZV_20c.assembly3-modify.fasta.PolcaCorrected.fa -r "VZV_20c_trimmed_P_1.fastq VZV_20c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124873 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- PCC1_VZV_20_1.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/PCC1_VZV_20_1.assembly3-modify.fasta -r "PCC1_VZV_20_1_trimmed_P_1.fastq PCC1_VZV_20_1_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 1 #Assembly Size: 124873 #Consensus Quality Before Polishing: 99.9984 #Consensus QV Before Polishing: 47.95 polca.sh -a PCC1_VZV_20_1.assembly3-modify.fasta.PolcaCorrected.fa -r "PCC1_VZV_20_1_trimmed_P_1.fastq PCC1_VZV_20_1_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124873 #Consensus Quality Before Polishing: 99.9992 #Consensus QV Before Polishing: 50.96 polca.sh -a PCC1_VZV_20_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "PCC1_VZV_20_1_trimmed_P_1.fastq PCC1_VZV_20_1_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124873 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- PCC1_VZV_20_2.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/PCC1_VZV_20_2.assembly2-gapfilled.fasta -r "PCC1_VZV_20_2_trimmed_P_1.fastq PCC1_VZV_20_2_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 1 #Assembly Size: 124866 #Consensus Quality Before Polishing: 99.9984 #Consensus QV Before Polishing: 47.95 polca.sh -a PCC1_VZV_20_2.assembly2-gapfilled.fasta.PolcaCorrected.fa -r "PCC1_VZV_20_2_trimmed_P_1.fastq PCC1_VZV_20_2_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124867 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- PCC1_VZV_20_5.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/PCC1_VZV_20_5.assembly3-modify.fasta -r "PCC1_VZV_20_5_trimmed_P_1.fastq PCC1_VZV_20_5_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124873 #Consensus Quality Before Polishing: 99.9992 #Consensus QV Before Polishing: 50.96 polca.sh -a PCC1_VZV_20_5.assembly3-modify.fasta.PolcaCorrected.fa -r "PCC1_VZV_20_5_trimmed_P_1.fastq PCC1_VZV_20_5_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124874 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- VZV_60S.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/VZV_60S.assembly3-modify.fasta -r "VZV_60S_trimmed_P_1.fastq VZV_60S_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 2 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124873 #Consensus Quality Before Polishing: 99.9984 #Consensus QV Before Polishing: 47.95 polca.sh -a VZV_60S.assembly3-modify.fasta.PolcaCorrected.fa -r "VZV_60S_trimmed_P_1.fastq VZV_60S_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124870 #Consensus Quality Before Polishing: 99.9992 #Consensus QV Before Polishing: 50.96 polca.sh -a VZV_60S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "VZV_60S_trimmed_P_1.fastq VZV_60S_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124870 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- VZV_60c.assembly2-gapfilled.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/VZV_60c.assembly2-gapfilled.fasta -r "VZV_60c_trimmed_P_1.fastq VZV_60c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 0 #Assembly Size: 119660 #Consensus Quality Before Polishing: 99.9992 #Consensus QV Before Polishing: 50.78 polca.sh -a VZV_60c.assembly2-gapfilled.fasta.PolcaCorrected.fa -r "VZV_60c_trimmed_P_1.fastq VZV_60c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 119660 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- PCC1_VZV_60_1.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/PCC1_VZV_60_1.assembly3-modify.fasta -r "PCC1_VZV_60_1_trimmed_P_1.fastq PCC1_VZV_60_1_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124843 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 polca.sh -a PCC1_VZV_60_1.assembly3-modify.fasta.PolcaCorrected.fa -r "PCC1_VZV_60_1_trimmed_P_1.fastq PCC1_VZV_60_1_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124839 #Consensus Quality Before Polishing: 99.9992 #Consensus QV Before Polishing: 50.96 polca.sh -a PCC1_VZV_60_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "PCC1_VZV_60_1_trimmed_P_1.fastq PCC1_VZV_60_1_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124839 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- PCC1_VZV_60_4.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/PCC1_VZV_60_4.assembly3-modify.fasta -r "PCC1_VZV_60_4_trimmed_P_1.fastq PCC1_VZV_60_4_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 1 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124851 #Consensus Quality Before Polishing: 99.9992 #Consensus QV Before Polishing: 50.96 polca.sh -a PCC1_VZV_60_4.assembly3-modify.fasta.PolcaCorrected.fa -r "PCC1_VZV_60_4_trimmed_P_1.fastq PCC1_VZV_60_4_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124851 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 #-- PCC1_VZV_60_6.assembly3-modify.fasta -- polca.sh -a ../viralngs/tmp/02_assembly/PCC1_VZV_60_6.assembly3-modify.fasta -r "PCC1_VZV_60_6_trimmed_P_1.fastq PCC1_VZV_60_6_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 3 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124873 #Consensus Quality Before Polishing: 99.9976 #Consensus QV Before Polishing: 46.19 polca.sh -a PCC1_VZV_60_6.assembly3-modify.fasta.PolcaCorrected.fa -r "PCC1_VZV_60_6_trimmed_P_1.fastq PCC1_VZV_60_6_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124871 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00
-
Multiple alignment of all corrected assembly
cat ./20S_polished/VZV_20S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./20c_polished/VZV_20c.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./20_1_polished/PCC1_VZV_20_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa ./20_2_polished/PCC1_VZV_20_2.assembly2-gapfilled.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./20_5_polished/PCC1_VZV_20_5.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa > 20.fasta cat ./60S_polished/VZV_60S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa ./60c_polished/VZV_60c.assembly2-gapfilled.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./60_1_polished/PCC1_VZV_60_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa ./60_4_polished/PCC1_VZV_60_4.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./60_6_polished/PCC1_VZV_60_6.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa > 60.fasta mafft --clustalout 20.fasta > 20.aln mafft --clustalout 60.fasta > 60.aln grep "NC_001348.1_con" 20.aln > PCC1_VZV_20_2-1.fasta seqtk seq PCC1_VZV_20_2-1.fasta -l 60 > PCC1_VZV_20_2.fasta polca.sh -a PCC1_VZV_20_2.fasta -r "PCC1_VZV_20_2_trimmed_P_1.fastq PCC1_VZV_20_2_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124866 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 cat ./20S_polished/VZV_20S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./20c_polished/VZV_20c.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./20_1_polished/PCC1_VZV_20_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa ./PCC1_VZV_20_2.fasta.PolcaCorrected.fa ./20_5_polished/PCC1_VZV_20_5.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa > 20_round2.fasta mafft --clustalout 20_round2.fasta > 20_round2.aln #--leavegappyregion #(Optional) Delete "-1", set 2 positions of the lines "************" python check_SNP_positions.py grep "NC_001348.1_con" 60.aln > VZV_60c-1.fasta seqtk seq VZV_60c-1.fasta -l 60 > VZV_60c.fasta polca.sh -a VZV_60c.fasta -r "VZV_60c_trimmed_P_1.fastq VZV_60c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 119660 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 cat ./60S_polished/VZV_60S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa VZV_60c.fasta.PolcaCorrected.fa ./60_1_polished/PCC1_VZV_60_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa ./60_4_polished/PCC1_VZV_60_4.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./60_6_polished/PCC1_VZV_60_6.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa > 60_round2.fasta mafft --clustalout 60_round2.fasta > 60_round2.aln muscle -in 20_round2.fasta -out 20_round2.aln -clw #(Optional) Delete "-1", set 2 positions of the lines "************" python check_SNP_positions.py grep "VZV_60c-1" 60_round2.aln > VZV_60c-2.fasta seqtk seq VZV_60c-2.fasta -l 60 > VZV_60c-3.fasta polca.sh -a VZV_60c-3.fasta -r "VZV_60c_trimmed_P_1.fastq VZV_60c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 119660 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 polca.sh -a VZV_60c-3.fasta.PolcaCorrected.fa -r "VZV_60c_trimmed_P_1.fastq VZV_60c_trimmed_P_2.fastq" -t 40 -m 10G #Stats BEFORE polishing: #Substitution Errors Found: 0 #Insertion/Deletion Errors Found: 0 #Assembly Size: 124896 #Consensus Quality Before Polishing: 100 #Consensus QV Before Polishing: 100.00 cat ./60S_polished/VZV_60S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa VZV_60c-3.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./60_1_polished/PCC1_VZV_60_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa ./60_4_polished/PCC1_VZV_60_4.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa ./60_6_polished/PCC1_VZV_60_6.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa > 60_round3.fasta mafft --auto --op 3 --ep 0.1 --clustalout 60_round3.fasta > 60_round3.aln #--leavegappyregion ulimit -s unlimited muscle -in 60_round3.fasta -out 60_round3.aln -clw -maxiters 2 mamba install -c bioconda clustalo clustalo -i 60_round3.fasta -o output.fasta --auto mamba install -c bioconda t-coffee t_coffee -seq input.fasta -outfile output.fasta #(Optional) Delete "-1", set 2 positions of the lines "************" python check_SNP_positions.py 1. Try a Different MAFFT Mode The default mode (--auto) may favor gaps too much. Use --globalpair or --localpair instead. Command: mafft --globalpair --maxiterate 1000 input.fasta > output.fasta or mafft --localpair --maxiterate 1000 input.fasta > output.fasta --globalpair: More accurate for closely related sequences. --localpair: Better if your sequences have recombination or partial homology. 2. Increase Gap Open Penalty (--op) The default gap opening penalty is low, leading to excessive gaps. Try increasing it: mafft --auto --op 3 input.fasta > output.fasta Default is 1.53; higher values reduce gaps. 3. Reduce Gap Extension Penalty (--ep) MAFFT extends gaps too easily. Lowering --ep discourages long gaps. mafft --auto --op 3 --ep 0.1 input.fasta > output.fasta cp ./20S_polished/VZV_20S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./20c_polished/VZV_20c.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./20_1_polished/PCC1_VZV_20_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./PCC1_VZV_20_2.fasta.PolcaCorrected.fa toSend cp ./20_5_polished/PCC1_VZV_20_5.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./60S_polished/VZV_60S.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa toSend cp VZV_60c-3.fasta.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./60_1_polished/PCC1_VZV_60_1.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./60_4_polished/PCC1_VZV_60_4.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa toSend cp ./60_6_polished/PCC1_VZV_60_6.assembly3-modify.fasta.PolcaCorrected.fa.PolcaCorrected.fa toSend
EPDM-Folie: Von der Reifenindustrie zum modernen Flachdachmaterial
EPDM(乙丙橡胶)和 Resitrix 都是用于屋顶和建筑防水的高性能防水材料,但它们在材质、施工方式和应用场景上有所不同。
- EPDM 薄膜(乙丙橡胶膜)
🔹 材质:EPDM 是一种合成橡胶(乙烯-丙烯-二烯单体),具有极高的耐候性和耐久性。 🔹 特点:
超长寿命(可达 50 年以上)
耐紫外线、耐臭氧、耐温差变化
高度弹性(可拉伸 300% 以上)
无焊接,使用胶水或热风拼接
适用于大面积屋顶
🔹 缺点:
需要专业施工,接缝处需额外密封
施工时对表面处理要求较高
某些类型的 EPDM 可能对特定化学物质不耐受
- Resitrix(增强型 EPDM)
🔹 材质:Resitrix 是一种改进型 EPDM,结合了EPDM 和高分子自粘防水层(如 SBS 改性沥青)。 🔹 特点:
自粘性,施工更简单,可用热风焊接,无需额外粘合剂
更高的机械强度,防穿刺性更好
耐化学腐蚀性更强
适用于复杂屋顶结构(如阳台、露台)
🔹 缺点:
价格比普通 EPDM 更贵
相对较重,施工时需要更多人力
- 总结:如何选择?
✅ EPDM 适合:大面积屋顶、预算有限、希望长期使用的项目。 ✅ Resitrix 适合:更复杂的屋面结构、需要高强度防护、希望更容易施工的项目。
👉 结论:如果预算有限且需要大面积施工,选择 EPDM;如果想要更好的施工便利性和耐久性,选择 Resitrix!
Das Material, das für ein einteiliges, durchgehendes und dichtes Flachdach verwendet wird, nennt man oft EPDM-Folie (Ethylen-Propylen-Dien-Monomer). Es handelt sich um eine elastische, wasserdichte Gummimembran, die häufig in einem Stück verlegt wird, um Fugen und somit potenzielle Schwachstellen zu vermeiden.
Andere ähnliche Materialien könnten sein:
- PVC-Dachfolie
- TPO-Folie (Thermoplastisches Polyolefin)
- Bitumenbahnen, wenn sie verschweißt werden.
EPDM ist jedoch besonders für seine Langlebigkeit und UV-Beständigkeit bekannt.
EPDM (Ethylen-Propylen-Dien-Kautschuk) hat tatsächlich eine enge Verbindung zur Reifenindustrie, da es ursprünglich als Nebenprodukt oder spezialisiertes Material für Gummianwendungen entwickelt wurde. Einer der größten Hersteller von EPDM-Materialien war Firestone, das ursprünglich ein Reifenunternehmen war.
Firestone Building Products, ein Geschäftsbereich von Firestone Tire and Rubber Company, wurde später ausgegliedert und spezialisierte sich auf Bauprodukte, einschließlich EPDM-Dachsysteme. Im Laufe der Jahre haben sich andere Unternehmen auf ähnliche Produkte konzentriert, nachdem sie sich von der Reifenproduktion getrennt haben, um die Bauindustrie zu bedienen.
Die Verbindung zwischen der Reifenindustrie und EPDM liegt in den Gemeinsamkeiten bei der Entwicklung von widerstandsfähigen und langlebigen Gummimaterialien.
What’s Happening?
In April 2021 Firestone Building Products was acquired by Holcim, global leader in sustainable construction.
Both Firestone & Holcim are well established multi-national companies in their own right with many years of expertise providing sustainable solutions to the construction sectors across the globe.
Today, Firestone, a premier provider of industry-leading roofing, wall and lining systems, is becoming Elevate™.
https://en.wikipedia.org/wiki/Elevate_(brand)
https://www.holcimelevate.com/dach-de/dachabdichtung/epdm/rubbercover-epdm
PCA Plot Created Using R for Data_Liu_PCA_plot
To make the circles (ellipses) more focused on the point clouds, we can adjust the level parameter in stat_ellipse() to a smaller value, which will create tighter circles around the clusters. Here’s the modified code with more focused circles:
library(ggrepel)
library(dplyr)
library(ggplot2)
library(readxl) # To read Excel files
# Load the data from the Excel file
merged_pca_data <- read_excel("PCA figure.xlsx")
# Prepare the data: select relevant columns and add a 'condition' variable
pca_data <- merged_pca_data %>%
select("Component 1", "Component 2", "Component 3", "Component 4", "Component 5",
"Component 6", "Component 7", "Component 8", "C: Grouping") %>%
rename(
PC1 = "Component 1",
PC2 = "Component 2",
PC3 = "Component 3",
PC4 = "Component 4",
PC5 = "Component 5",
PC6 = "Component 6",
PC7 = "Component 7",
PC8 = "Component 8",
condition = "C: Grouping"
) %>%
mutate(condition = factor(condition)) # Add 'condition' as a factor for coloring
# Prepare PCA plot data (data to be plotted)
plot_pca_df <- pca_data
# Save PCA plot as PNG
png("PCA_plot.png", width = 1000, height = 600, res = 150)
# Create PCA plot using ggplot with tighter circles around point clouds
ggplot(plot_pca_df, aes(x = PC1, y = PC2, color = condition)) +
geom_point(size = 4, alpha = 0.8) + # Plot points
stat_ellipse(
type = "norm",
level = 0.6, # Reduced from 0.9 to make circles tighter
size = 1,
linetype = 2,
alpha = 0.7 # Added transparency to circles
) +
labs(
title = "", # PCA Plot
x = "Principal Component 1",
y = "Principal Component 2"
) +
theme_minimal() +
scale_color_manual(values = c("blue", "red")) + # Customize colors for different conditions
theme(legend.title = element_blank()) # Optional: remove legend title
dev.off() # Close the PNG device and save the file
Key changes:
-
Reduced the level parameter from 0.9 to 0.6 to make the circles tighter around the point clouds
-
Added alpha = 0.7 to make the circles slightly transparent
-
Kept the dashed line style (linetype = 2) for better visibility
If the circles are still too large, you can try:
-
Decreasing the level further (e.g., 0.5 or 0.4)
-
Using type = “t” instead of “norm” for a more robust ellipse estimation
-
Adding segments = 100 to make the ellipses smoother
For perfect circles (if your data is properly scaled), we could also try:
stat_ellipse(type = "euclid", level = 0.6, size = 1, linetype = 2, alpha = 0.7)
Small RNA sequencing processing in the example of smallRNA_7 using exceRpt
-
Input data
mkdir ~/DATA/Data_Ute/Data_Ute_smallRNA_7/raw_data cd raw_data cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf930/01_0505_WaGa_wt_EV_RNA_S1_R1_001.fastq.gz 0505_WaGa_wt.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf931/02_0505_WaGa_sT_DMSO_EV_RNA_S2_R1_001.fastq.gz 0505_WaGa_sT_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf932/03_0505_WaGa_sT_Dox_EV_RNA_S3_R1_001.fastq.gz 0505_WaGa_sT_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf933/04_0505_WaGa_scr_DMSO_EV_RNA_S4_R1_001.fastq.gz 0505_WaGa_scr_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf934/05_0505_WaGa_scr_Dox_EV_RNA_S5_R1_001.fastq.gz 0505_WaGa_scr_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf935/06_1905_WaGa_wt_EV_RNA_S6_R1_001.fastq.gz 1905_WaGa_wt.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf936/07_1905_WaGa_sT_DMSO_EV_RNA_S7_R1_001.fastq.gz 1905_WaGa_sT_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf937/08_1905_WaGa_sT_Dox_EV_RNA_S8_R1_001.fastq.gz 1905_WaGa_sT_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf938/09_1905_WaGa_scr_DMSO_EV_RNA_S9_R1_001.fastq.gz 1905_WaGa_scr_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf939/10_1905_WaGa_scr_Dox_EV_RNA_S10_R1_001.fastq.gz 1905_WaGa_scr_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf940/11_control_MKL1_S11_R1_001.fastq.gz control_MKL1.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf941/12_control_WaGa_S12_R1_001.fastq.gz control_WaGa.fastq.gz #END
-
Run cutadapt
some common adapter sequences from different kits for reference: - TruSeq Small RNA (Illumina): TGGAATTCTCGGGTGCCAAGG - Small RNA Kits V1 (Illumina): TCGTATGCCGTCTTCTGCTTGT - Small RNA Kits V1.5 (Illumina): ATCTCGTATGCCGTCTTCTGCTTG - NEXTflex Small RNA Sequencing Kit v3 for Illumina Platforms (Bioo Scientific): TGGAATTCTCGGGTGCCAAGG - LEXOGEN Small RNA-Seq Library Prep Kit (Illumina): TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC * mkdir trimmed; cd trimmed for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 -o ${sample}_cutadapted.fastq.gz --minimum-length 5 --trim-n ../raw_data/${sample}.fastq.gz >> LOG done # -- check if it is necessary to remove adapter from 5'-end -- (Option_1) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -o /dev/null --report=minimal 0505_WaGa_wt_cutadapted.fastq.gz --> The trimming statistics in the output will show how often 5'-end adapters were removed. (Option 2) zcat your_sample.fastq.gz | grep 'TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC' | head -n 20 (Option 3) fastqc your_sample.fastq.gz #Open the generated HTML report and check: # The "Overrepresented sequences" section for adapter sequences. # The "Per base sequence content" plot to see if there are unexpected sequences at the start of reads. #(If check results shows both ends contain adapter) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 10 -o ${sample}_trimmed.fastq.gz ${sample}.fastq.gz >> LOG2 # -g → Trims 5'-end adapters # -a → Trims 3'-end adapters; -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC → Specifies the adapter sequence to be removed from the 3' end of the reads. The sequence provided is common in RNA-seq libraries (e.g., Illumina small RNA sequencing). # -q 20 → Performs quality trimming at both read ends, removing bases with a Phred quality score below 20.
-
Install exceRpt (https://github.gersteinlab.org/exceRpt/)
docker pull rkitchen/excerpt mkdir MyexceRptDatabase cd /mnt/nvme0n1p1/MyexceRptDatabase wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz tar -xvf exceRptDB_v4_hg38_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg19_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_mm10_lowmem.tgz wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOmiRNArRNA.tgz tar -xvf exceRptDB_v4_EXOmiRNArRNA.tgz wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOGenomes.tgz tar -xvf exceRptDB_v4_EXOGenomes.tgz
-
Run exceRpt
#[COMPLETE_DB] docker run -v /mnt/nvme0n1p1/MyInputSample:/exceRptInput \ -v /mnt/nvme0n1p1/MyResults:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/0505_WaGa_wt_cutadapted.fastq.gz \ MAIN_ORGANISM_GENOME_ID=hg38 \ N_THREADS=50 \ JAVA_RAM='800G' #[SMALL_DB] docker run -v /mnt/nvme0n1p1/MyInputSample:/exceRptInput \ -v /mnt/nvme0n1p1/MyResults:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz N_THREADS=50 \ JAVA_RAM='800G' #[REAL_RUNNING] mkdir results for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' done mkdir results2 for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/raw_data:/exceRptInput \ -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results2:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' done #Most of the Docker command is loading directories on your machine (the -v parameters) so that exceRpt can read from or write to them. The directory to the left of each : can obviously be whatever you want, but it is important to make sure the right side of each : is written as above or exceRpt will not be able to find/write the data it needs.
-
Analysis customisation options
There are a number of options available for customising the analysis that are specified using the command-line. These are a list of the most commonly-modified options and their default values.
Required OPTIONs:
* INPUT_FILE_PATH | Path to the input fastq/fasta/sra file
Main analysis OPTIONs:
* ADAPTER_SEQ | 'guessKnown'/'none'/
| [default: ‘guessKnown’] will attempt to guess the 3 adapter using known sequences. The actual adapter can be input here if known, or specify ‘none’ if the adapter is already removed * SAMPLE_NAME | | add an optional ID to the input file specified above * MAIN_ORGANISM_GENOME_ID | ‘hg38’/’hg19’/’mm10’ | [default: ‘hg38’] changes the organism/genome build used for alignment * CALIBRATOR_LIBRARY | | path to a bowtie2 index of calibrator oligos used for QC or normalisation * ENDOGENOUS_LIB_PRIORITY | | [default: ‘miRNA,tRNA,piRNA,gencode,circRNA’] choose the priority of each library during read assignment and quantification Additional analysis OPTIONs: * TRIM_N_BASES_5p | | [default: ‘0’] remove N bases from the 5′ end of every read * TRIM_N_BASES_3p | | [default: ‘0’] remove N bases from the 3′ end of every read * RANDOM_BARCODE_LENGTH | | [default: 0] identify and remove random barcodes of this number of nucleotides. For a Bioo prep with a 4N random barcode on both the 3′ and 5′ adapter, this value should be ‘4’. * RANDOM_BARCODE_LOCATION | ‘-5p -3p’/’-5p’/’-3p’ | [default: ‘-5p -3p’] specify where to look for the random barcode(s) * KEEP_RANDOM_BARCODE_STATS | ‘false’/’true’ | [default: ‘false’] specify whether or not to calculate overrepresentation statistics using the random barcodes (this may be slow and memory intensive!) * DOWNSAMPLE_RNA_READS | | [default: NULL] choose whether to downsample to this number of reads after assigning reads to the various transcriptome libraries (may be useful for normalising very different yields) Hardware-specific OPTIONs: * N_THREADS | | [default: 4] change the number of threads used in the alignments performed by exceRpt * JAVA_RAM | | [default: ’10G’] change the amount of memory (RAM) available to Java. This may need to be higher if crashes occur during quantification or random barcode stats calculation * REMOVE_LARGE_INTERMEDIATE_FILES | ‘false’/’true’ | [default: ‘false’] when exceRpt finishes, choose whether to remove the large alignment files that can take a lot of disk space Alignment/QC OPTIONs: * MIN_READ_LENGTH | | [default: 18] minimum read-length to use after adapter (+ random barcode) removal * QFILTER_MIN_QUAL | | [default: 20] minimum base-call quality of the read * QFILTER_MIN_READ_FRAC | | [default: 80] read must have base-calls higher than the value above for at least this fraction of its length * STAR_alignEndsType | ‘Local’/’EndToEnd’ | [default: Local] defines the alignment mode; local alignment is recommended to allow for isomiRs * STAR_outFilterMatchNmin | | [default: 18] minimum number of bases to include in the alignment (should match the minimum read length defined above) * STAR_outFilterMatchNminOverLread | | [default: 0.9] minimum fraction of the read that *must* remain following soft-clipping in a local alignment * STAR_outFilterMismatchNmax | | [default: 1] maximum allowed mismatched bases in the aligned portion of the read * MAX_MISMATCHES_EXOGENOUS | | [default: 0] maximum allowed mismatched bases in the *entire* read when aligning to exogenous sequences -
Understanding the exceRpt output contained in OUTPUT_DIR
A variety of output files are created for each sample as they are run through the exceRpt pipeline. At the highest level, 5 files and one directory are output to the OUTPUT_DIR:
[sampleID]/ | Directory containing the complete set of output files for this sample [sampleID]_CORE_RESULTS_v*.tgz | Archive containing only the most commonly used results for this sample [sampleID].err | Text file containing error logging information for this run [sampleID].log | Text file containing normal logging information for this run [sampleID].qcResult | Text file containing a variety of QC metrics for this sample [sampleID].stats | Text file containing a variety of alignment statistics for this sample
This archive ([sampleID]_CORE_RESULTS_v4.*.tgz) contains the most commonly used results for this sample and is the only file required to run the mergePipelineRuns.R script described below for processing the output from multiple runs of the exceRpt pipeline (i.e. for multiple samples). The contents of this archive are as follows:
[sampleID].log | Same as above [sampleID].stats | Same as above [sampleID].qcResult | Same as above [sampleID]/[sampleID].readCounts_*_sense.txt | Read counts of each annotated RNA using sense alignments [sampleID]/[sampleID].readCounts_*_sense.txt | Read counts of each annotated RNA using antisense alignments [sampleID]/[sampleID].*.coverage.txt | Contains read-depth across all gencode transcripts [sampleID]/[sampleID].*.CIGARstats.txt | Summary of the alignment characteristics for genome-mapped reads [sampleID]/[sampleID].*_fastqc.zip | FastQC output both before and after UniVec/rRNA contaminant removal [sampleID]/[sampleID].*.readLengths.txt | Counts of the number of reads of each length following adapter removal [sampleID]/[sampleID].*.counts | Read counts mapped to UniVec & rRNA (and calibrator oligo, if used) sequences [sampleID]/[sampleID].*.knownAdapterSeq | 3' adapter sequence guessed (from known adapters) in this sample [sampleID]/[sampleID].*.adapterSeq | 3' adapter used to clip the reads in this run [sampleID]/[sampleID].*.qualityEncoding | PHRED encoding guessed for the input sequence reads
The main results directory ([sampleID]/, e.g. control_MKL1_cutadapted.fastq/) contains all files above as well as the following:
Intermediate files containing reads ‘surviving’ each stage, in the following order of 1) 3’ adapter clipping, 2) 5’/3’ end trimming, 3) read-quality and homopolymer filtering, 4) UniVec contaminant removal, and 5) rRNA removal: [sampleID]/[sampleID].*.fastq.gz | Reads remaining after each QC / filtering / alignment step Reads aligned at each step of the pipeline in the following order 1) UniVec, 2) rRNA, 3) endogenous genome, 4) endogenous transcriptome: [sampleID]/filteringAlignments_*.bam | Alignments to the UniVec and rRNA sequences [sampleID]/endogenousAlignments_genome*.bam | Alignments (ungapped) to the endogenous genome [sampleID]/endogenousAlignments_genomeMapped_transcriptome*.bam | Transcriptome alignments (ungapped) of reads mapped to the genome [sampleID]/endogenousAlignments_genomeUnmapped_transcriptome*.bam | Transcriptome alignments (ungapped) of reads **not** mapped to the genome Alignment summary information obtained after invoking the library priority. In the default setting, this will choose a miRBase alignment over any other alignment, for example if it is aligned to both a miRNA in miRBase and a miRNA in Gencode, the miRBase alignment is kept and all others discarded. It is especially important for tRNAs to be chosen in favour of piRNAs, as the latter have quite a large number of mis-annotations to the former. [sampleID]/endogenousAlignments_Accepted.txt.gz | All compatible alignments against the transcriptome after invoking the library priority [sampleID]/endogenousAlignments_Accepted.dict | Contains the ID(s) of the RNA annotations indexed in the fifth column of the .txt.gz file above Finally, the quantifications are stored in the various readCounts_*.txt files. The format of these tab-delimited files is as follows: ReferenceID uniqueReadCount totalReadCount multimapAdjustedReadCount multimapAdjustedBarcodeCount hsa-miR-143-3p:MIMAT0000435:Homo:sapiens:miR-143-3p 1235 4147219 4147219.0 0.0 hsa-miR-10b-5p:MIMAT0000254:Homo:sapiens:miR-10b-5p 1430 2420500 2420241.0 0.0 hsa-miR-10a-5p:MIMAT0000253:Homo:sapiens:miR-10a-5p 1115 784863 784600.5 0.0 hsa-miR-192-5p:MIMAT0000222:Homo:sapiens:miR-192-5p 759 559068 558542.5 0.0 Where ReferenceID is the ID of this annotated RNA, uniqueReadCount is the number of unique insert sequences attributed to this annotated RNA, totalReadCount is the total number of reads attributable to this annotated RNA, multimapAdjustedReadCount is the count after adjusting for multi-mapped reads, and multimapAdjustedBarcodeCount (available only for samples prepped with randomly barcoded 5’ and/or 3’ adapters such as Bioo) is the number of unique N-mer barcodes adjusted for multimapping ambiguity in the insert sequence.
-
Processing exceRpt output from multiple samples
Also provided is a script to combine output from multiple samples run through the exceRpt pipeline. The script (mergePipelineRuns.R) will take as input a directory containing 1 or more subdirectories or zipfiles containing output from the makefile above. In this way, results from 1 or more smallRNA-seq samples can be combined, several QC plots are generated, and the read-counts are normalised ready for downstream analysis by clustering and/or differential expression.
Installation
This script is comparatively much simpler to install. Once the R software (http://cran.r-project.org/) is set up on your system the script should automatically identify and install all required dependencies. Again, this script is available on the Genboree Workbench (www.genboree.org) and is also free for academic use.
Using the script: On the command line
mamba activate r_env jhuang@WS-2290C:/mnt/nvme0n1p1/exceRpt-master$ Rscript mergePipelineRuns.R /home/jhuang/DATA/Data_Ute/Data_Ute_smallRNA_7/MyResults/ #OBSERVE the env of R: ~/mambaforge/envs/r_env/lib/R/library #which R: /home/jhuang/mambaforge/envs/r_env/bin/R #The env is nothing to do with "sudo chmod -R 777 /usr/lib/R/site-library" #ERROR: MyResults is not writable --> DEBUG: sudo chown -R jhuang:jhuang MyResults MyResults2 results results2
Using the script: Interactively in R
Alternatively in an interactive R session, the merge can be performed using the following two commands: #mkdir MySummaries (r_env) jhuang@WS-2290C:~/DATA/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master$ R > source("mergePipelineRuns_functions.R") #DEBUG freetype-error #sudo apt-get install libfreetype6-dev mamba activate r_env mamba install -c conda-forge --force-reinstall freetype fontconfig pkg-config library(systemfonts) system_fonts() # Should return font list without errors > processSamplesInDir("../MyResults/", "../MySummaries") 2025-03-28 18:18:40.916167: Searching for valid exceRpt pipeline output in ../MyResults/ 2025-03-28 18:18:44.479166: Found 12 valid samples 2025-03-28 18:18:44.892834: Reading sample data... 2025-03-28 18:18:47.02125: [1/12] Added sample '0505_WaGa_scr_DMSO_cutadapted.fastq' 2025-03-28 18:18:49.314131: [2/12] Added sample '0505_WaGa_scr_Dox_cutadapted.fastq' 2025-03-28 18:18:52.701234: [3/12] Added sample '0505_WaGa_sT_DMSO_cutadapted.fastq' 2025-03-28 18:18:57.191507: [4/12] Added sample '0505_WaGa_sT_Dox_cutadapted.fastq' 2025-03-28 18:19:00.162267: [5/12] Added sample '0505_WaGa_wt_cutadapted.fastq' 2025-03-28 18:19:05.992193: [6/12] Added sample '1905_WaGa_scr_DMSO_cutadapted.fastq' 2025-03-28 18:19:11.061668: [7/12] Added sample '1905_WaGa_scr_Dox_cutadapted.fastq' 2025-03-28 18:19:16.101974: [8/12] Added sample '1905_WaGa_sT_DMSO_cutadapted.fastq' 2025-03-28 18:19:21.43279: [9/12] Added sample '1905_WaGa_sT_Dox_cutadapted.fastq' 2025-03-28 18:19:30.264677: [10/12] Added sample '1905_WaGa_wt_cutadapted.fastq' 2025-03-28 18:19:38.989424: [11/12] Added sample 'control_MKL1_cutadapted.fastq' 2025-03-28 18:19:47.058822: [12/12] Added sample 'control_WaGa_cutadapted.fastq' 2025-03-28 18:19:47.059524: Creating raw read-count matrices for available libraries 2025-03-28 18:19:47.122305: Saving raw data to disk [1] "Attempting to save to: ../MySummaries/exceRpt_smallRNAQuants_ReadCounts.RData" [1] "Directory exists? TRUE" [1] "Directory writable? TRUE" 2025-03-28 18:19:47.888117: Normalising to RPM 2025-03-28 18:19:47.906386: Saving normalised data to disk 2025-03-28 18:19:49.156846: Creating QC plots 2025-03-28 18:19:49.18454: Plotting read-length distributions 2025-03-28 18:19:50.033017: Plotting run-duration 2025-03-28 18:19:50.521018: Plotting # mapped reads 2025-03-28 18:19:50.525369: Plotting mapping stats heatmap (1/3) 2025-03-28 18:19:50.714444: Plotting mapping stats heatmap (2/3) 2025-03-28 18:19:50.909217: Plotting mapping stats heatmap (3/3) 2025-03-28 18:19:51.100313: Plotting QC result 2025-03-28 18:19:51.954369: Plotting biotype counts 2025-03-28 18:19:53.470085: Plotting miRNA expression distributions 2025-03-28 18:19:54.861385: All done! 2025-03-28 18:19:54.861712: Warning messages: Warning message: In install.packages(update[instlib == l, "Package"], l, ... : installation of package ‘systemfonts’ had non-zero exit status There were 27 warnings (use warnings() to see them) Apart from some status messages, warnings, or possibly errors, no R objects are output from this function. Instead several files are created that are described immediately below…
Script output
Several files are output by the script in the location of the input exceRpt results (or somewhere else if explicitly specified). All output files are prefixed with ‘exceRpt_’ and contain a variety of information regarding all samples input: File Name Description QC data: exceRpt_DiagnosticPlots.pdf All diagnostic plots automatically generated by the merge script exceRpt_readMappingSummary.txt Read-alignment summary including total counts for each library exceRpt_ReadLengths.txt Read-lengths (after 3’ adapters/barcodes are removed) Raw transcriptome quantifications: exceRpt_miRNA_ReadCounts.txt miRNA read-counts quantifications exceRpt_tRNA_ReadCounts.txt tRNA read-counts quantifications exceRpt_piRNA_ReadCounts.txt piRNA read-counts quantifications exceRpt_gencode_ReadCounts.txt gencode read-counts quantifications exceRpt_circularRNA_ReadCounts.txt circularRNA read-count quantifications Normalised transcriptome quantifications: exceRpt_miRNA_ReadsPerMillion.txt miRNA RPM quantifications exceRpt_tRNA_ReadsPerMillion.txt tRNA RPM quantifications exceRpt_piRNA_ReadsPerMillion.txt piRNA RPM quantifications exceRpt_gencode_ReadsPerMillion.txt gencode RPM quantifications exceRpt_circularRNA_ReadsPerMillion.txt circularRNA RPM quantifications R objects: exceRpt_smallRNAQuants_ReadCounts.RData All raw data (binary R object) exceRpt_smallRNAQuants_ReadsPerMillion.RData All normalised data (binary R object)
Comprehensive smallRNA-7 profiling using exceRpt pipeline with full reference databases
TODO_1: Update the image
-
Input data
mkdir ~/DATA/Data_Ute/Data_Ute_smallRNA_7/raw_data cd raw_data cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf930/01_0505_WaGa_wt_EV_RNA_S1_R1_001.fastq.gz 0505_WaGa_wt.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf931/02_0505_WaGa_sT_DMSO_EV_RNA_S2_R1_001.fastq.gz 0505_WaGa_sT_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf932/03_0505_WaGa_sT_Dox_EV_RNA_S3_R1_001.fastq.gz 0505_WaGa_sT_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf933/04_0505_WaGa_scr_DMSO_EV_RNA_S4_R1_001.fastq.gz 0505_WaGa_scr_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf934/05_0505_WaGa_scr_Dox_EV_RNA_S5_R1_001.fastq.gz 0505_WaGa_scr_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf935/06_1905_WaGa_wt_EV_RNA_S6_R1_001.fastq.gz 1905_WaGa_wt.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf936/07_1905_WaGa_sT_DMSO_EV_RNA_S7_R1_001.fastq.gz 1905_WaGa_sT_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf937/08_1905_WaGa_sT_Dox_EV_RNA_S8_R1_001.fastq.gz 1905_WaGa_sT_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf938/09_1905_WaGa_scr_DMSO_EV_RNA_S9_R1_001.fastq.gz 1905_WaGa_scr_DMSO.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf939/10_1905_WaGa_scr_Dox_EV_RNA_S10_R1_001.fastq.gz 1905_WaGa_scr_Dox.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf940/11_control_MKL1_S11_R1_001.fastq.gz control_MKL1.fastq.gz cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf941/12_control_WaGa_S12_R1_001.fastq.gz control_WaGa.fastq.gz #END
-
Run cutadapt
some common adapter sequences from different kits for reference: - TruSeq Small RNA (Illumina): TGGAATTCTCGGGTGCCAAGG - Small RNA Kits V1 (Illumina): TCGTATGCCGTCTTCTGCTTGT - Small RNA Kits V1.5 (Illumina): ATCTCGTATGCCGTCTTCTGCTTG - NEXTflex Small RNA Sequencing Kit v3 for Illumina Platforms (Bioo Scientific): TGGAATTCTCGGGTGCCAAGG - LEXOGEN Small RNA-Seq Library Prep Kit (Illumina): TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC * mkdir trimmed; cd trimmed for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 -o ${sample}_cutadapted.fastq.gz --minimum-length 5 --trim-n ../raw_data/${sample}.fastq.gz >> LOG done # -- check if it is necessary to remove adapter from 5'-end -- (Option_1) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -o /dev/null --report=minimal 0505_WaGa_wt_cutadapted.fastq.gz --> The trimming statistics in the output will show how often 5'-end adapters were removed. (Option 2) zcat your_sample.fastq.gz | grep 'TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC' | head -n 20 (Option 3) fastqc your_sample.fastq.gz #Open the generated HTML report and check: # The "Overrepresented sequences" section for adapter sequences. # The "Per base sequence content" plot to see if there are unexpected sequences at the start of reads. #(If check results shows both ends contain adapter) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 10 -o ${sample}_trimmed.fastq.gz ${sample}.fastq.gz >> LOG2 # -g → Trims 5'-end adapters # -a → Trims 3'-end adapters; -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC → Specifies the adapter sequence to be removed from the 3' end of the reads. The sequence provided is common in RNA-seq libraries (e.g., Illumina small RNA sequencing). # -q 20 → Performs quality trimming at both read ends, removing bases with a Phred quality score below 20.
-
Install exceRpt (https://github.gersteinlab.org/exceRpt/)
docker pull rkitchen/excerpt mkdir MyexceRptDatabase cd /mnt/nvme0n1p1/MyexceRptDatabase wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz tar -xvf exceRptDB_v4_hg38_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg19_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_mm10_lowmem.tgz wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOmiRNArRNA.tgz tar -xvf exceRptDB_v4_EXOmiRNArRNA.tgz wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOGenomes.tgz tar -xvf exceRptDB_v4_EXOGenomes.tgz
-
Run exceRpt
#[COMPLETE_DB] docker run -v /mnt/nvme0n1p1/MyInputSample:/exceRptInput \ -v /mnt/nvme0n1p1/MyResults:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/0505_WaGa_wt_cutadapted.fastq.gz \ MAIN_ORGANISM_GENOME_ID=hg38 \ N_THREADS=50 \ JAVA_RAM='800G' #[SMALL_DB] docker run -v /mnt/nvme0n1p1/MyInputSample:/exceRptInput \ -v /mnt/nvme0n1p1/MyResults:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz N_THREADS=50 \ JAVA_RAM='800G' #[REAL_RUNNING_SMALL_DB] mkdir results for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' done mkdir results_exo2 for sample in 0505_WaGa_wt; do for sample in 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo2:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/gh38 \ -v /mnt/nvme0n1p1/MyexceRptDatabase/miRBase:/exceRpt_DB/miRBase \ -v /mnt/nvme0n1p1/MyexceRptDatabase/NCBI_taxonomy_taxdump:/exceRpt_DB/NCBI_taxonomy_taxdump \ -v /mnt/nvme0n1p1/MyexceRptDatabase/Genomes_BacteriaFungiMammalPlantProtistVirus:/exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus \ -v /mnt/nvme0n1p1/MyexceRptDatabase/ribosomeDatabase:/exceRpt_DB/ribosomeDatabase \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on done #DEBUG_1 for ERROR: could not find adapters at path /exceRpt_DB/adapters/adapters.fa #The /exceRpt_DB/adapters/adapters.fa in the Docker environment will be overwritten when assigning a new directory as /exceRpt_DB. Therefore, we should create a new adapters.fa file in the new database environment jhuang@WS-2290C:/mnt/nvme0n1p1/MyexceRptDatabase$ cp -r ../exceRpt/exceRpt_coreDB/* ./ #DEBUG_2 for EXITING because of fatal input ERROR: could not open user-defined parameters file /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in #jhuang@WS-2290C:/mnt/nvme0n1p1/MyexceRptDatabase$ cp STAR_Parameters_Exogenous.in Genomes_BacteriaFungiMammalPlantProtistVirus/ #Debugging Tips # Verify Database Structure and Ensure your mounted /exceRpt_DB contains: # /exceRpt_DB # ├── hg38/ # Endogenous # ├── NCBI_taxonomy_taxdump/ # Taxonomy # └── Genomes_BacteriaFungi.../ # Exogenous references # Check Intermediate Files # Confirm that the endogenous step generates the expected input for exogenous processing (e.g., exogenous_alignments.sam). mkdir results_g results_exo4 results_exo5 docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo4:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/testData_human.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on #NOTE that rkitchen/excerpt refers to exceRpt_shortRNA (bash script): The extra-cellular RNA processing toolkit (exceRpt) optimised for smallRNA analysis; This pipeline processes a single smallRNA sequence file from a single sample #TODO_3: how to call exceRpt_longRNA: The extra-cellular RNA processing toolkit (exceRpt) optimised for longRNA analysis; This pipeline processes a single longRNA sequence file from a single sample. # docker inspect rkitchen/excerpt:latest; docker history rkitchen/excerpt:latest; docker history --no-trunc rkitchen/excerpt:latest # "Entrypoint": [ # "make", # "-f", # "/exceRpt_bin/exceRpt_smallRNA", # "EXE_DIR=/exceRpt_bin", # "DATABASE_PATH=/exceRpt_DB", # "JAVA_EXE=java", # "OUTPUT_DIR=/exceRptOutput", # "MAP_EXOGENOUS=off", # "N_THREADS=4" # ] #[REAL_RUNNING_COMPLETE_DB] #NOTE that if not renamed in the input files, then have to RENAME all files recursively by removing "_cutadapted.fastq" in all names in _CORE_RESULTS_v4.6.3.tgz (first unzip, removing, then zip, mv to ../results_g). cd trimmed for file in *_cutadapted.fastq.gz; do echo "mv \"$file\" \"${file/_cutadapted.fastq/}\"" done mkdir results_exo5 for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo5:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on done #The running process: https://github.com/gersteinlab/exceRpt/blob/master/exceRpt_smallRNA (bash script) in docker, then call java scripts https://github.com/gersteinlab/exceRpt/blob/master/exceRpt_Tools/main/ExceRpt_Tools.java, ProcessEndogenousAlignments.java and ProcessExogenousAlignments.java. #NOTE that in exceRpt_smallRNA.sh ## Choose what kind of EXOGENOUS alignments to attempt: ## - off : none ## - miRNA : map only to exogenous miRNAs in miRbase ## - on : map to exogenous miRNAs in miRbase AND the genomes of all sequenced species in ensembl/NCBI #Most of the Docker command is loading directories on your machine (the -v parameters) so that exceRpt can read from or write to them. The directory to the left of each : can obviously be whatever you want, but it is important to make sure the right side of each : is written as above or exceRpt will not be able to find/write the data it needs.
-
Processing exceRpt output from multiple samples
Also provided is a script to combine output from multiple samples run through the exceRpt pipeline. The script (mergePipelineRuns.R) will take as input a directory containing 1 or more subdirectories or zipfiles containing output from the makefile above. In this way, results from 1 or more smallRNA-seq samples can be combined, several QC plots are generated, and the read-counts are normalised ready for downstream analysis by clustering and/or differential expression.
Installation
This script is comparatively much simpler to install. Once the R software (http://cran.r-project.org/) is set up on your system the script should automatically identify and install all required dependencies. Again, this script is available on the Genboree Workbench (www.genboree.org) and is also free for academic use.
Using the script: On the command line
mamba activate r_env jhuang@WS-2290C:/mnt/nvme0n1p1/exceRpt-master$ Rscript mergePipelineRuns.R /home/jhuang/DATA/Data_Ute/Data_Ute_smallRNA_7/MyResults/ #OBSERVE the env of R: ~/mambaforge/envs/r_env/lib/R/library #which R: /home/jhuang/mambaforge/envs/r_env/bin/R #The env is nothing to do with "sudo chmod -R 777 /usr/lib/R/site-library" #ERROR: MyResults is not writable --> DEBUG: sudo chown -R jhuang:jhuang MyResults MyResults2 results results2
— COUNTINE HERE after docker running –> Using the script: Interactively in R
#Alternatively in an interactive R session, the merge can be performed using the following two commands: mkdir summaries_g summaries_exo4 summaries_exo5 (r_env) jhuang@WS-2290C:~/DATA/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master$ R #WARNING: need to reload the R-script after each change of the script. source("mergePipelineRuns_functions.R") # -- DEBUG freetype-error -- # #sudo apt-get install libfreetype6-dev # mamba activate r_env # mamba install -c conda-forge --force-reinstall freetype fontconfig pkg-config # library(systemfonts) # system_fonts() # Should return font list without errors getwd() [1] "/media/jhuang/Elements/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master" processSamplesInDir("../results_g/", "../summaries_g") processSamplesInDir("../results_exo4/", "../summaries_exo4") processSamplesInDir("../results_exo5/", "../summaries_exo5") #~/Tools/csv2xls-0.4/csv_to_xls.py exceRpt_miRNA_ReadsPerMillion.txt exceRpt_tRNA_ReadsPerMillion.txt exceRpt_piRNA_ReadsPerMillion.txt -d$'\t' -o exceRpt_results_detailed.xls
Script output
Several files are output by the script in the location of the input exceRpt results (or somewhere else if explicitly specified). All output files are prefixed with ‘exceRpt_’ and contain a variety of information regarding all samples input: File Name Description QC data: exceRpt_DiagnosticPlots.pdf All diagnostic plots automatically generated by the merge script exceRpt_readMappingSummary.txt Read-alignment summary including total counts for each library exceRpt_ReadLengths.txt Read-lengths (after 3’ adapters/barcodes are removed) Raw transcriptome quantifications: exceRpt_miRNA_ReadCounts.txt miRNA read-counts quantifications exceRpt_tRNA_ReadCounts.txt tRNA read-counts quantifications exceRpt_piRNA_ReadCounts.txt piRNA read-counts quantifications exceRpt_gencode_ReadCounts.txt gencode read-counts quantifications exceRpt_circularRNA_ReadCounts.txt circularRNA read-count quantifications Normalised transcriptome quantifications: exceRpt_miRNA_ReadsPerMillion.txt miRNA RPM quantifications exceRpt_tRNA_ReadsPerMillion.txt tRNA RPM quantifications exceRpt_piRNA_ReadsPerMillion.txt piRNA RPM quantifications exceRpt_gencode_ReadsPerMillion.txt gencode RPM quantifications exceRpt_circularRNA_ReadsPerMillion.txt circularRNA RPM quantifications R objects: exceRpt_smallRNAQuants_ReadCounts.RData All raw data (binary R object) exceRpt_smallRNAQuants_ReadsPerMillion.RData All normalised data (binary R object)
-
Re-draw the heatmap plots
#genome 97.9% 98.3% 21.3% 44.9% 81.4% 78.3% 78.5% 79.3% 73.3% 69.2% 65.6% 71.9% #miRNA_sense 84.7% 85.6% 3.5% 7.1% 16.2% 14.7% 15.8% 15.3% 7.5% 7.0% 12.9% 14.6% #miRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% # #miRNAprecursor_sense 0.1% 0.1% 0.0% 0.0% 0.1% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% #miRNAprecursor_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% # #tRNA_sense 3.4% 1.8% 8.4% 25.3% 45.3% 41.4% 48.8% 47.3% 52.1% 49.0% 41.2% 33.9% #tRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% # #piRNA_sense 0.6% 0.5% 0.1% 0.4% 0.3% 0.4% 0.5% 0.4% 0.4% 0.5% 0.4% 0.6% #piRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% # #gencode_sense 7.0% 8.5% 6.7% 8.6% 15.7% 16.6% 10.8% 12.9% 11.2% 10.8% 8.5% 18.3% #gencode_antisense 0.1% 0.1% 0.7% 0.3% 0.2% 0.3% 0.2% 0.2% 0.2% 0.2% 0.2% 0.3% #gencode 7.10% 8.60% 7.40% 8.90% 15.90% 16.90% 11.00% 13.10% 11.40% 11.00% 8.70% 18.60% # #circularRNA_sense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% #circularRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% # #not_mapped_to_genome_or_libs 2.1% 1.7% 78.7% 55.1% 18.6% 21.7% 21.5% 20.7% 26.7% 30.8% 34.4% 28.1% import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt # Define data samples = [ "control MKL1", "control WaGa", "WaGa wildtype 0505", "WaGa wildtype 1905", "WaGa sT DMSO 0505", "WaGa sT DMSO 1905", "WaGa sT Dox 0505", "WaGa sT Dox 1905", "WaGa scr DMSO 0505", "WaGa scr DMSO 1905", "WaGa scr Dox 0505", "WaGa scr Dox 1905" ] #TODO_2: genome --> human_genome, not_mapped_to_genome_or_libs --> not_mapped_to_human_genome # send the new results including exogenous alignments to Ute! #categories = [ # "reads_used_for_alignment", "genome", "miRNA", "miRNAprecursor", "tRNA", "piRNA", # "gencode", "circularRNA", "not_mapped_to_genome_or_libs" #] categories = [ "reads_used_for_alignment", "human_genome", "miRNA", "miRNAprecursor", "tRNA", "piRNA", "gencode", "circularRNA", "not_mapped_to_human_genome" ] data = np.array([ [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [97.9, 98.3, 44.9, 21.3, 65.6, 71.9, 78.5, 81.4, 73.3, 79.3, 69.2, 78.3], [84.7, 85.6, 7.1, 3.5, 12.9, 14.6, 15.8, 16.2, 7.5, 15.3, 7.0, 14.7], [0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.1, 0.0, 0.1], [3.4, 1.8, 25.3, 8.4, 41.2, 33.9, 48.8, 45.3, 52.1, 47.3, 49.0, 41.4], [0.6, 0.5, 0.4, 0.1, 0.4, 0.6, 0.5, 0.3, 0.4, 0.4, 0.5, 0.4], [7.1, 8.6, 8.9, 7.4, 8.7, 18.6, 11.0, 15.9, 11.4, 13.1, 11.0, 16.9], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [2.1, 1.7, 55.1, 78.7, 34.4, 28.1, 21.5, 18.6, 26.7, 20.7, 30.8, 21.7] ]) ## Load data from Excel file #file_path = "mapping_heatmap.xlsx" # ## Read Excel file, assuming first column is index (row labels) #df = pd.read_excel(file_path, index_col=0) # Convert percentages to decimals data = data / 100.0 # Create DataFrame df = pd.DataFrame(data, index=categories, columns=samples) # Plot heatmap plt.figure(figsize=(14, 6)) sns.heatmap(df, annot=True, cmap="coolwarm", fmt=".3f", linewidths=0.5, cbar_kws={'label': 'Fraction Aligned Reads'}) # Improve layout plt.title("Heatmap of Read Alignments by Category and Sample", fontsize=14) plt.xlabel("Sample", fontsize=12) plt.ylabel("Read Category", fontsize=12) plt.xticks(rotation=15, ha="right", fontsize=10) plt.yticks(rotation=0, fontsize=10) plt.tight_layout() # Save as PNG plt.savefig("mapping_heatmap.png", dpi=300, bbox_inches="tight") # Show plot plt.show()
-
Key steps of log: This log details the execution of a small RNA sequencing data analysis pipeline using the exceRpt tool (version 4.6.3) in a Docker container. The pipeline processes a human small RNA-seq dataset (testData_human.fastq.gz) with the following key steps:
-
Initial Setup
- Docker container launched with mounted volumes for input/output and reference databases.
- Parameters: hg38 genome, 50 threads, 200GB Java memory, exogenous mapping enabled.
- Docker container launched with input/output volume mounts
- 50 threads allocated with 200GB Java memory
- hg38 reference genome specified
-
Preprocessing
- Adapter detection and trimming using known adapter sequences.
- Quality filtering (Phred score ≥20, length ≥18nt).
- Removal of homopolymer-rich reads and low-quality sequences.
- Input FASTQ file decompressed (testData_human.fastq.gz)
- Adapter sequences identified using adapters.fa
- Quality encoding determined (Phred+33/64)
- Adapter clipping performed (TCGTATGCCGTCTTCTGCTTG)
- Quality filtering (Q20, p<80%)
- Homopolymer repeats filtered (max 66% single nt)
-
Contaminant Filtering
- Alignment against UniVec contaminants and ribosomal RNA (rRNA) databases.
- 322 reads processed, with statistics tracked at each step.
-
Endogenous RNA Analysis
- Alignment to human genome (hg38) and transcriptome.
- Quantification of small RNA types:
- miRNA (mature/precursor): Sense strands detected (antisense absent).
- tRNA, piRNA, gencode transcripts: Only sense strands reported.
- circRNA: Not detected in this dataset.
- Coverage and complexity metrics calculated.
-
Exogenous RNA Analysis
- Screened for microbial/viral RNAs:
- miRNA databases (miRBase).
- Ribosomal RNA databases.
- Comprehensive genomic databases (bacteria, plants, metazoa, fungi, viruses).
- Taxonomic classification of exogenous hits performed.
- Screened for microbial/viral RNAs:
-
QC & Results
- QC Result: PASS (based on transcriptome/genome ratio >0.5 and >100k transcriptome reads).
- Key Metrics:
- Input Reads: ~1.5 million (exact count not shown in log).
- Genome Mapped: Majority of reads.
- Transcriptome Complexity: Calculated ratio.
- Core results compressed into testData_human.fastq_CORE_RESULTS_v4.6.3.tgz.
-
Notable Observations:
- Antisense Reads: Absent for miRNA, tRNA, and piRNA (common in small RNA-seq).
- Potential Issues: Some files (e.g., antisense counts) were missing but did not disrupt pipeline.
- Resource Usage: High RAM (200GB) and multi-threading (50 cores) employed for efficiency.
-
Output Files:
- Quantified counts for endogenous RNAs (miRNA, tRNA, etc.).
- Exogenous RNA alignments with taxonomic annotations.
- QC report, adapter sequences, and alignment statistics.
-
-
Raw LOG of the pipeline providing a comprehensive small RNA profile, distinguishing host transcripts from contaminants and exogenous RNAs.
jhuang@WS-2290C:/media/jhuang/Elements/Data_Ute/Data_Ute_smallRNA_7$ docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo4:/exceRptOutput -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB -t rkitchen/excerpt INPUT_FILE_PATH=/exceRptInput/testData_human.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on # mkdir -p /exceRptOutput/testData_human.fastq # gunzip -c /exceRptInput/testData_human.fastq.gz 2>> /exceRptOutput/testData_human.fastq.err | java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar FindAdapter -n 10000 -m 1000000 -s 4 -a /exceRpt_DB/adapters/adapters.fa - > /exceRptOutput/testData_human.fastq/testData_human.fastq.adapterSeq 2>> /exceRptOutput/testData_human.fastq.log # ## ASCII 84 is equal to Q20 (p<0.01) in Phred+64, so any file with max quals greater than this can reasonably assumed to be Phred+64 gunzip -c /exceRptInput/testData_human.fastq.gz | head -n 40000 | awk '{if(NR%4==0) printf("%s",$0);}' | od -A n -t u1 | grep -v "^\*" | awk 'BEGIN{min=100;max=0;}{for(i=1;i<=NF;i++) {if($i>max) max=$i; if($i<min) min=$i;}}END{if(max<84) print "33"; else print "64";}' > /exceRptOutput/testData_human.fastq/testData_human.fastq.qualityEncoding cat: /exceRptOutput/testData_human.fastq/testData_human.fastq.knownAdapterSeq: No such file or directory ## Run the SW alignment of known adapters regardless of user preference gunzip -c /exceRptInput/testData_human.fastq.gz 2>> /exceRptOutput/testData_human.fastq.err | java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar FindAdapter -n 1000 -m 100000 -s 4 -a /exceRpt_DB/adapters/adapters.fa - > /exceRptOutput/testData_human.fastq/testData_human.fastq.knownAdapterSeq 2>> /exceRptOutput/testData_human.fastq.log #@echo -e "`/bin/date "+%Y-%m-%d--%H:%M:%S"` exceRpt_smallRNA: Known adapter sequence: \n" >> /exceRptOutput/testData_human.fastq.log ## Carry on with the adapter provided / guessed gunzip -c /exceRptInput/testData_human.fastq.gz > /exceRptOutput/testData_human.fastq/testData_human.fastq.preClipped.fastq.tmp; /exceRpt_bin/fastx_0.0.14/bin/fastx_clipper -Q33 -a TCGTATGCCGTCTTCTGCTTG -l 18 -v -n -M 7 -i /exceRptOutput/testData_human.fastq/testData_human.fastq.preClipped.fastq.tmp -z -o /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.tmp.gz >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err; rm /exceRptOutput/testData_human.fastq/testData_human.fastq.preClipped.fastq.tmp ## Count reads input to adapter clipping grep "Input: " /exceRptOutput/testData_human.fastq.log | awk '{print "input\t"$2}' >> /exceRptOutput/testData_human.fastq.stats ## Count reads output following adapter clipping grep "Output: " /exceRptOutput/testData_human.fastq.log | awk '{print "successfully_clipped\t"$2}' >> /exceRptOutput/testData_human.fastq.stats ## Remove random barcodes if there are any mv /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.tmp.gz /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.gz gunzip -c /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.gz | java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar TrimFastq -5p 0 -3p 0 | gzip -c > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.fastq.gz 2>>/exceRptOutput/testData_human.fastq.log gunzip -c /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.fastq.gz | /exceRpt_bin/fastx_0.0.14/bin/fastq_quality_filter -v -Q33 -p 80 -q 20 > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.tmp 2>>/exceRptOutput/testData_human.fastq.log ## Count reads that failed the quality filter grep "low-quality reads" /exceRptOutput/testData_human.fastq.log | awk '{print "failed_quality_filter\t"$2}' >> /exceRptOutput/testData_human.fastq.stats # # Filter homopolymer reads (those that have too many single nt repeats) java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar RemoveHomopolymerRepeats --verbose -m 0.66 -i /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.tmp -o /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.REMOVEDRepeatReads.fastq gzip /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.REMOVEDRepeatReads.fastq gzip /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq rm /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.tmp ## Count homopolymer repeat reads that failed the quality filter grep "Done. Sequences removed" /exceRptOutput/testData_human.fastq.log | awk -F "=" '{print "failed_homopolymer_filter\t"$2}' >> /exceRptOutput/testData_human.fastq.stats gunzip -c /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq.gz > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar GetSequenceLengths /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.readLengths.txt 2>> /exceRptOutput/testData_human.fastq.err rm /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq java -classpath /exceRpt_bin/FastQC_0.11.7:/exceRpt_bin/FastQC_0.11.7/sam-1.103.jar:/exceRpt_bin/FastQC_0.11.7/jbzip2-0.9.jar -Xmx200G -Dfastqc.threads=50 -Dfastqc.unzip=false -Dfastqc.output_dir=/exceRptOutput/testData_human.fastq/ uk/ac/babraham/FastQC/FastQCApplication /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq.gz >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err ## Count calibrator oligo reads echo -e "calibrator\tNA" >> /exceRptOutput/testData_human.fastq.stats # /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_ --genomeDir /exceRpt_DB/UniVec/STAR_INDEX_UniVec --readFilesIn /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Aligned.out.bam | awk '{print $3}' | sort -k 2,2 2>> /exceRptOutput/testData_human.fastq.err | uniq --count > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.counts 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Aligned.out.bam | awk '{print $1}' | sort 2>> /exceRptOutput/testData_human.fastq.err | uniq -c | wc -l > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.readCount 2>> /exceRptOutput/testData_human.fastq.err; gzip -c /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noUniVecContaminants.fastq.gz; rm /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Unmapped.out.mate1 ## Count UniVec contaminant reads cat /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.readCount | awk '{print "UniVec_contaminants\t"$1}' >> /exceRptOutput/testData_human.fastq.stats /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_ --genomeDir /exceRpt_DB/hg38/STAR_INDEX_rRNA --readFilesIn /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noUniVecContaminants.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam | awk '{print $3}' | sort -k 2,2 2>> /exceRptOutput/testData_human.fastq.err | uniq -c > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.counts 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam | awk '{print $1}' | sort 2>> /exceRptOutput/testData_human.fastq.err | uniq -c | wc -l > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.readCount 2>> /exceRptOutput/testData_human.fastq.err; gzip -c /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA.fastq.gz; rm /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Unmapped.out.mate1 ## Count rRNA reads cat /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.readCount | awk ' {print "rRNA\t"$1}' >> /exceRptOutput/testData_human.fastq.stats # /exceRpt_bin/samtools-1.7/samtools sort -@ 50 -m 2G -O bam -T /exceRptOutput/testData_human.fastq/tmp /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam > /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.sorted.bam /exceRpt_bin/samtools-1.7/samtools index /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.sorted.bam rm /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam java -classpath /exceRpt_bin/FastQC_0.11.7:/exceRpt_bin/FastQC_0.11.7/sam-1.103.jar:/exceRpt_bin/FastQC_0.11.7/jbzip2-0.9.jar -Xmx200G -Dfastqc.threads=50 -Dfastqc.unzip=false -Dfastqc.output_dir=/exceRptOutput/testData_human.fastq/ uk/ac/babraham/FastQC/FastQCApplication /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA.fastq.gz >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_ --genomeDir /exceRpt_DB/hg38/STAR_INDEX_genome --readFilesIn /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err # ## sort the alignments by ReadID just in case these are paired end reads in a single file? -- no, better to flag that this is an invalid file (ToDo) # ## v use this line when we start dealing with paired-end reads #/exceRpt_bin/samtools-1.7/samtools fastq -1 /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1 -2 /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate2 /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam /exceRpt_bin/samtools-1.7/samtools fastq /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam > /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1 [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 322 reads # ## map ALL READS to the TRANSCRIPTOME (STAR ungapped) /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_ --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1 --genomeDir /exceRpt_DB/hg38/STAR_INDEX_transcriptome --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 --readFilesCommand - >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Unmapped.R1.fastq.gz # /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_ --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Unmapped.out.mate1 --outReadsUnmapped Fastx --genomeDir /exceRpt_DB/hg38/STAR_INDEX_transcriptome --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 --readFilesCommand - >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.R1.fastq.gz # ## Count # mapped reads cat /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Log.final.out | grep "Number of input reads" | awk -F "|\t" '{print "reads_used_for_alignment\t"$2}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/endogenousAlignments_genome*apped_transcriptome_Log.final.out | grep "Number of input reads\|Uniquely mapped reads number\|Number of reads mapped to multiple loci" | sed '2,4d' | awk -F "|\t" '{SUM+=$2}END{print "genome\t"SUM}' >> /exceRptOutput/testData_human.fastq.stats # ## Compress STAR logs gzip /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Log.out gzip /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Log.out # ## Tidy up rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_SJ.out.tab rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_SJ.out.tab rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1 rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Unmapped.out.mate1 rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Unmapped.out.mate1 rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.out.mate1 # java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar CIGAR_2_PWM -f /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam > /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam.CIGARstats.txt 2>> /exceRptOutput/testData_human.fastq.log # /exceRpt_bin/samtools-1.7/samtools sort -n -@ 50 -m 2G -O bam -T /exceRptOutput/testData_human.fastq/tmp /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.bam > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam 2>> /exceRptOutput/testData_human.fastq.log # java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ReadCoverage -exceRpt -a /exceRpt_DB/hg38/gencodeAnnotation.gtf -f /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam 2>> /exceRptOutput/testData_human.fastq.log # rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam # ## Assign reads java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessEndogenousAlignments --libPriority miRNA,tRNA,piRNA,gencode,circRNA --genomeMappedReads /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.bam --transcriptomeMappedReads /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Aligned.out.bam --hairpin2genome /exceRpt_DB/hg38/miRNA_precursor2genome.sam --mature2hairpin /exceRpt_DB/hg38/miRNA_mature2precursor.sam --dict /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.dict 2>> /exceRptOutput/testData_human.fastq.log | sort -k 2,2 -k 1,1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt # ## Do we want to downsample? # ## Quantify all annotated RNAs java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar QuantifyEndogenousAlignments --dict /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.dict --acceptedAlignments /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt --outputPath /exceRptOutput/testData_human.fastq 2>> /exceRptOutput/testData_human.fastq.log # ## Summarise alignment statistics cat /exceRptOutput/testData_human.fastq/readCounts_miRNAmature_sense.txt | awk '{SUM+=$4}END{printf "miRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/readCounts_miRNAmature_antisense.txt | awk '{SUM+=$4}END{printf "miRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat: /exceRptOutput/testData_human.fastq/readCounts_miRNAmature_antisense.txt: No such file or directory cat /exceRptOutput/testData_human.fastq/readCounts_miRNAprecursor_sense.txt | awk '{SUM+=$4}END{printf "miRNAprecursor_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/readCounts_miRNAprecursor_antisense.txt | awk '{SUM+=$4}END{printf "miRNAprecursor_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat: /exceRptOutput/testData_human.fastq/readCounts_miRNAprecursor_antisense.txt: No such file or directory cat /exceRptOutput/testData_human.fastq/readCounts_tRNA_sense.txt | awk '{SUM+=$4}END{printf "tRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/readCounts_tRNA_antisense.txt | awk '{SUM+=$4}END{printf "tRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat: /exceRptOutput/testData_human.fastq/readCounts_tRNA_antisense.txt: No such file or directory cat /exceRptOutput/testData_human.fastq/readCounts_piRNA_sense.txt | awk '{SUM+=$4}END{printf "piRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/readCounts_piRNA_antisense.txt | awk '{SUM+=$4}END{printf "piRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat: /exceRptOutput/testData_human.fastq/readCounts_piRNA_antisense.txt: No such file or directory cat /exceRptOutput/testData_human.fastq/readCounts_gencode_sense.txt | awk '{SUM+=$4}END{printf "gencode_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/readCounts_gencode_antisense.txt | awk '{SUM+=$4}END{printf "gencode_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/readCounts_circRNA_sense.txt | awk '{SUM+=$4}END{printf "circularRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat: /exceRptOutput/testData_human.fastq/readCounts_circRNA_sense.txt: No such file or directory cat /exceRptOutput/testData_human.fastq/readCounts_circRNA_antisense.txt | awk '{SUM+=$4}END{printf "circularRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats cat: /exceRptOutput/testData_human.fastq/readCounts_circRNA_antisense.txt: No such file or directory ## Count reads not mapping to the genome or to the libraries gunzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.R1.fastq.gz | wc -l | awk '{print "not_mapped_to_genome_or_libs\t"($1/4)}' >> /exceRptOutput/testData_human.fastq.stats # ## Tidy up gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt > /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt.gz rm /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_ --genomeDir /exceRpt_DB/hg38/STAR_INDEX_repetitiveElements --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err ## Assigned non-redundantly to annotated REs /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Aligned.out.bam | grep -v "^@" | awk '{print $1}' | sort | uniq | wc -l | awk '{print "repetitiveElements\t"$0}' >> /exceRptOutput/testData_human.fastq.stats gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Unmapped.R1.fastq.gz /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_ --alignIntronMax 0 --alignIntronMin 21 --genomeDir /exceRpt_DB/hg38/STAR_INDEX_genome --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err ## mapped to the genome with gaps /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Aligned.out.bam | grep -v "^@" | awk '{print $1}' | sort | uniq | wc -l | awk '{print "endogenous_gapped\t"$0}' >> /exceRptOutput/testData_human.fastq.stats gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Unmapped.R1.fastq.gz mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_ --genomeDir /exceRpt_DB/miRBase/STAR_INDEX_miRBaseAll --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err gzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.R1.fastq.gz rm /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.out.mate1 # ## quantify read alignments using a slight hack of the endogenous alignment engine java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessEndogenousAlignments --forceLib miRNA --transcriptomeMappedReads /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Aligned.out.bam --hairpin2genome /exceRpt_DB/miRBase/miRNA_precursor2genome.sam --mature2hairpin /exceRpt_DB/miRBase/miRNA_mature2precursor.sam --dict /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.dict 2>> /exceRptOutput/testData_human.fastq.log | sort -k 2,2 -k 1,1 > /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt # java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar QuantifyEndogenousAlignments --dict /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.dict --acceptedAlignments /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt --outputPath /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA 2>> /exceRptOutput/testData_human.fastq.log # ## Tidy up: gzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt > /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousMiRNAAlignments_Accepted.txt.gz rm /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt # ## Stats cat /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Log.final.out | grep "Number of input reads" | awk -F "|\t" '{print "input_to_exogenous_miRNA\t"$2}' >> /exceRptOutput/testData_human.fastq.stats cat /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Log.final.out | grep "Uniquely mapped reads number\|Number of reads mapped to multiple loci" | awk -F "|\t" '{SUM+=$2}END{print "exogenous_miRNA\t"SUM}' >> /exceRptOutput/testData_human.fastq.stats mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_ --genomeDir /exceRpt_DB/ribosomeDatabase/exogenous_rRNAs --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log ## Input to exogenous rRNA alignment grep "Number of input reads" /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Log.final.out | tr '[:blank:]' ' ' | awk -F " \\\| " '{print "input_to_exogenous_rRNA\t"$2}' >> /exceRptOutput/testData_human.fastq.stats ## Assigned non-redundantly to annotated exogenous rRNAs cat /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Log.final.out | grep "Uniquely mapped reads number\|Number of reads mapped to multiple loci" | awk -F "|\t" '{SUM+=$2}END{print "exogenous_rRNA\t"SUM}' >> /exceRptOutput/testData_human.fastq.stats #/exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Aligned.out.bam | awk '{print $1}' | sort | uniq | wc -l | awk '{print "exogenous_rRNA\t"$0}' >> /exceRptOutput/testData_human.fastq.stats ## compress and tidy up gzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz rm /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Unmapped.out.mate1 rm /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Log.out /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | sort -k 1,1 -k 2,2 | uniq > /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.txt 2>> /exceRptOutput/testData_human.fastq.log # java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessExogenousAlignments -taxonomyPath /exceRpt_DB/NCBI_taxonomy_taxdump -min 0.001 -frac 0.95 --minReads 3 -batchSize 20000 -alignments /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.txt --rdp > /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.tmp 2>> /exceRptOutput/testData_human.fastq.log # # Tidy up mv /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.tmp /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.result.taxaAnnotated.txt rm /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.txt mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria5_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA5 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria6_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA6 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria7_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA7 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria8_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA8 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria9_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA9 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria10_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA10 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants5_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS5 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa5_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA5 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_FUNGI_PROTIST_VIRUS --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log # /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria5_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria6_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria7_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria8_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria9_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria10_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt # /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants5_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt # /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa5_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt # /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_Aligned.out.bam | grep "Virus:" | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[:|]");print $1"\t"a[1]"\t"a[3]"\t"a[5]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_Aligned.out.bam | grep "Fungi:" | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_Aligned.out.bam | grep "Protist:" | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt # /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt # cat /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt | sort -k 1,1 | gzip -c > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt.gz rm /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt # ## Input to exogenous genome alignment gunzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz | wc -l | awk '{print "input_to_exogenous_genomes\t"$1/4}' >> /exceRptOutput/testData_human.fastq.stats ## Count reads mapped to exogenous genomes: gunzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt.gz | awk '{print $1}' | uniq | wc -l | awk '{print "exogenous_genomes\t"$1}' >> /exceRptOutput/testData_human.fastq.stats # gunzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt.gz > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessExogenousAlignments -taxonomyPath /exceRpt_DB/NCBI_taxonomy_taxdump -min 0.001 -frac 0.95 -batchSize 500000 -minReads 3 -alignments /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.tmp 2>> /exceRptOutput/testData_human.fastq.log # Tidy up mv /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.tmp /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.result.taxaAnnotated.txt rm /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt ## Wrap up logging and stats files # ## Adapter confidence echo -e "known: " >> /exceRptOutput/testData_human.fastq.qctmp cat /exceRptOutput/testData_human.fastq/testData_human.fastq.knownAdapterSeq >> /exceRptOutput/testData_human.fastq.qctmp echo -e "used: " >> /exceRptOutput/testData_human.fastq.qctmp cat /exceRptOutput/testData_human.fastq/testData_human.fastq.adapterSeq >> /exceRptOutput/testData_human.fastq.qctmp cat /exceRptOutput/testData_human.fastq.qctmp | tr '\n' ' ' | awk -F ' ' '{if($2=="used:"){ if(NF==2){print "Adapter_confidence: LOW"}else{print "Adapter_confidence: WARN_unableToGuessAdapter_usingProvided("$3")"}}else{if($2==$4){print "Adapter_confidence: HIGH"}else{print "Adapter_confidence: WARN_providedAdapter("$4")DisagreesWithGuessed("$2")"}}}' > /exceRptOutput/testData_human.fastq.qcResult # ## Calculate QC result cat /exceRptOutput/testData_human.fastq.stats | grep "^input" | head -n 1 | awk '{print $2}' > /exceRptOutput/testData_human.fastq.qctmp cat /exceRptOutput/testData_human.fastq.stats | grep "^genome" | awk '{print $2}' >> /exceRptOutput/testData_human.fastq.qctmp cat /exceRptOutput/testData_human.fastq.stats | grep "sense" | awk '{SUM+=$2}END{print SUM}' >> /exceRptOutput/testData_human.fastq.qctmp cat /exceRptOutput/testData_human.fastq.qctmp | tr '\n' '\t' | awk '{result="FAIL"; ratio=0; if($2>0){ratio=$3/$2}; if(ratio>0.5 && $3>100000)result="PASS"}END{print "QC_result: "result"\nInputReads: "$1"\nGenomeReads: "$2"\nTranscriptomeReads: "$3"\nTranscriptomeGenomeRatio: "ratio}' >> /exceRptOutput/testData_human.fastq.qcResult gunzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt.gz | wc -l > /exceRptOutput/testData_human.fastq.qctmp gunzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt.gz | awk '{print $2}' | uniq | wc -l >> /exceRptOutput/testData_human.fastq.qctmp # cat /exceRptOutput/testData_human.fastq.qctmp | tr '\n' '\t' | awk '{if($1>0){print "TranscriptomeComplexity: "($2/$1)}else{print "TranscriptomeComplexity: 0"}}' >> /exceRptOutput/testData_human.fastq.qcResult rm /exceRptOutput/testData_human.fastq.qctmp # ## Compress core results files automatically ls -lh /exceRptOutput/testData_human.fastq | awk '{print $9}' | grep "readCounts_\|.readLengths.txt\|_fastqc.zip\|.counts\|.knownAdapterSeq\|.adapterSeq\|.qualityEncoding\|.CIGARstats.txt\|.coverage.txt" | awk '{print "testData_human.fastq/"$1}' > /exceRptOutput/testData_human.fastq_filesToCompress.txt; echo testData_human.fastq.log >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; echo testData_human.fastq.stats >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; echo testData_human.fastq.qcResult >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq | awk '{print $9}' | grep "calibratormapped.counts" | awk '{print "testData_human.fastq/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA | awk '{print $9}' | grep "readCounts_" | awk '{print "testData_human.fastq/EXOGENOUS_miRNA/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA | awk '{print $9}' | grep "ExogenousRibosomalAlignments.result.taxaAnnotated.txt" | awk '{print "testData_human.fastq/EXOGENOUS_rRNA/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes | awk '{print $9}' | grep "ExogenousGenomicAlignments.result.taxaAnnotated.txt" | awk '{print "testData_human.fastq/EXOGENOUS_genomes/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt tar -cvz -C /exceRptOutput -T /exceRptOutput/testData_human.fastq_filesToCompress.txt -f /exceRptOutput/testData_human.fastq_CORE_RESULTS_v4.6.3.tgz 2> /dev/null testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam.coverage.txt testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam.CIGARstats.txt testData_human.fastq/readCounts_gencode_antisense.txt testData_human.fastq/readCounts_gencode_antisense_geneLevel.txt testData_human.fastq/readCounts_gencode_sense.txt testData_human.fastq/readCounts_gencode_sense_geneLevel.txt testData_human.fastq/readCounts_miRNAmature_sense.txt testData_human.fastq/readCounts_miRNAprecursor_sense.txt testData_human.fastq/readCounts_piRNA_sense.txt testData_human.fastq/readCounts_tRNA_sense.txt testData_human.fastq/testData_human.fastq.adapterSeq testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA_fastqc.zip testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.counts testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.readLengths.txt testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.counts testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered_fastqc.zip testData_human.fastq/testData_human.fastq.knownAdapterSeq testData_human.fastq/testData_human.fastq.qualityEncoding testData_human.fastq.log testData_human.fastq.stats testData_human.fastq.qcResult testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.result.taxaAnnotated.txt testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.result.taxaAnnotated.txt rm /exceRptOutput/testData_human.fastq_filesToCompress.txt ## END PIPELINE
Run the viral-ngs Snakemake pipelines inside a Docker environment
-
Pull the viral-ngs Docker Image
docker pull quay.io/broadinstitute/viral-ngs docker run -t quay.io/broadinstitute/viral-ngs # Without /bin/bash → May run and exit immediately docker run -it quay.io/broadinstitute/viral-ngs # With /bin/bash → Stays open for interaction docker run -it --entrypoint /bin/bash quay.io/broadinstitute/viral-ngs docker run -it quay.io/broadinstitute/viral-ngs docker attach
docker run -v /home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/viralngs_docker/data:/user-data -it quay.io/broadinstitute/viral-ngs snakemake –printshellcmds –cores 80 # IMPORTANT_NOTE: we can have a look of the structure of env rkitchen/excerpt docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo5:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \ –entrypoint /bin/bash -it rkitchen/excerpt #\ #INPUT_FILE_PATH=/exceRptInput/xxx.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM=’200G’ MAP_EXOGENOUS=on docker inspect quay.io/broadinstitute/viral-ngs “Env”: [ “PATH=/opt/viral-ngs/source:/opt/miniconda/envs/viral-ngs-env/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”, “LANG=en_US.UTF-8”, “LANGUAGE=en_US:en”, “LC_ALL=en_US.UTF-8”, “MINICONDA_PATH=/opt/miniconda”, “INSTALL_PATH=/opt/viral-ngs”, “VIRAL_NGS_PATH=/opt/viral-ngs/source”, “CONDA_DEFAULT_ENV=viral-ngs-env”, “CONDA_PREFIX=/opt/miniconda/envs/viral-ngs-env”, “JAVA_HOME=/opt/miniconda”, “VIRAL_NGS_DOCKER_DATA_PATH=/user-data”, “NOVOALIGN_PATH=/novoalign”, “GATK_PATH=/gatk” ] -
Set Up the Analysis Directory Structure
The pipeline expects a specific directory structure. Inside your host machine (not the container), create: viralngs_docker/ │── config.yaml # Copy from viral-ngs/pipes/ │── Snakefile # Copy from viral-ngs/pipes/ │── data/ │ ├── 00_raw/ # Place input BAM files (e.g., `sample1.bam`) │ ├── 01_cleaned/ │ ├── 01_per_sample/ │ ├── 02_align_to_self/ │ ├── 02_assembly/ │ ├── 03_align_to_ref/ │ ├── 03_interhost/ │ ├── 04_intrahost/ │── log/ │── reports/ │── tmp/ │── samples-depletion.txt # List samples (one per line, e.g., `sample1`) │── samples-assembly.txt # List samples for assembly │── samples-runs.txt # List samples for interhost analysis │── samples-assembly-failures.txt # (Leave empty) Key Steps: Input BAMs: Place your .bam files in data/00_raw/ (e.g., sample1.bam). Sample Lists: samples-depletion.txt → Samples for depletion pipeline. samples-assembly.txt → Samples for assembly pipeline. samples-runs.txt → Samples for interhost analysis.
-
Run the Pipeline in Docker
Mount your analysis directory into the container and execute Snakemake: cd /mnt/md1/DATA/Data_Huang_Human_herpesvirus_3 docker run -it \ -v "$(pwd)/viralngs_docker:/opt/viral-ngs-analysis" \ -w /opt/viral-ngs-analysis \ quay.io/broadinstitute/viral-ngs \ snakemake --cores all --use-conda Flags Explained: Flag Purpose -v $(pwd)/viralngs_docker:/opt/viral-ngs-analysis Mounts your host directory into the container. -w /opt/viral-ngs-analysis Sets the working directory inside the container. --cores all Uses all available CPU cores. --use-conda Ensures Conda environments are used (if specified in rules).
-
Customize config.yaml
Edit the config.yaml file (copied from viral-ngs/pipes/) to match your project: # Example config.yaml adjustments: ref_genome: "path/to/reference.fasta" threads: 40 # Number of CPU threads
-
Monitor Pipeline Progress
Logs: Check log/ for detailed logs. Snakemake Options: snakemake -n → Dry run (simulate pipeline). snakemake --dag | dot -Tpng > dag.png → Generate a workflow graph.
-
Post-Run Outputs
Results will be organized in: data/02_assembly/ → Assembled genomes. data/03_interhost/ → Interhost variants. reports/ → Summary reports.
-
Troubleshooting
Issue: Missing Dependencies
If Snakemake fails due to missing tools, ensure Conda is available inside Docker: /user-data/viral_ngs_dbs/ docker run -it \ -v "$(pwd)/viralngs_docker:/opt/viral-ngs-analysis" \ -w /opt/viral-ngs-analysis \ quay.io/broadinstitute/viral-ngs \ bash -c "conda install -y snakemake && snakemake --cores 20 --use-conda" #NOTE that we can also install tools inside Docker!!!!
Issue: Permissions
Ensure the container can write to your mounted directory: chmod -R a+rwx viralngs_docker
-
Alternative: Run Inside an Interactive Container (FINAL RUNNABLE)
For debugging, start a shell and run Snakemake manually:
#docker run -it \ #-v "$(pwd)/viralngs_docker:/opt/viral-ngs-analysis" \ #-w /opt/viral-ngs-analysis \ #--entrypoint /bin/bash \ #quay.io/broadinstitute/viral-ngs docker run \ -v /mnt/md1/DATA/Data_Huang_Human_herpesvirus_3/viralngs_docker:/opt/viral-ngs-analysis \ -v /home/jhuang/REFs:/user-data \ -v /home/jhuang/Tools/novocraft_v3:/novoalign \ -v /home/jhuang/Tools/GenomeAnalysisTK-3.6:/gatk \ -w /opt/viral-ngs-analysis \ --entrypoint /bin/bash \ -it quay.io/broadinstitute/viral-ngs #Under viral-ngs-analysis ln -s /opt/viral-ngs/source bin # Inside the container: snakemake --cores 20 --use-conda
-
LOG
root@37a95bb989f3:/opt/viral-ngs-analysis# snakemake --cores 20 --use-conda Building DAG of jobs... Using shell: /bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 10 assemble_spades 1 consolidate_fastqc_on_all_assemblies 2 consolidate_fastqc_on_all_runs 1 consolidate_spike_count 10 depletion 30 fastqc_report 10 filter_to_taxon 10 isnvs_per_sample 1 isnvs_vcf 10 map_reads_to_self 20 merge_one_per_sample 1 multi_align_mafft 10 orient_and_impute 10 refine_assembly_1 10 refine_assembly_2 10 spikein_report 147 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/PCC1_VZV_60_6.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/PCC1_VZV_60_6.bwa_depleted.bam, tmp/01_cleaned/PCC1_VZV_60_6.bmtagger_depleted.bam, tmp/01_cleaned/PCC1_VZV_60_6.rmdup.bam, data/01_cleaned/PCC1_VZV_60_6.cleaned.bam jobid: 128 wildcards: sample=PCC1_VZV_60_6 resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/VZV_60c.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/VZV_60c.bwa_depleted.bam, tmp/01_cleaned/VZV_60c.bmtagger_depleted.bam, tmp/01_cleaned/VZV_60c.rmdup.bam, data/01_cleaned/VZV_60c.cleaned.bam jobid: 116 wildcards: sample=VZV_60c resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/VZV_20c.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/VZV_20c.bwa_depleted.bam, tmp/01_cleaned/VZV_20c.bmtagger_depleted.bam, tmp/01_cleaned/VZV_20c.rmdup.bam, data/01_cleaned/VZV_20c.cleaned.bam jobid: 112 wildcards: sample=VZV_20c resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/PCC1_VZV_20_2.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/PCC1_VZV_20_2.bwa_depleted.bam, tmp/01_cleaned/PCC1_VZV_20_2.bmtagger_depleted.bam, tmp/01_cleaned/PCC1_VZV_20_2.rmdup.bam, data/01_cleaned/PCC1_VZV_20_2.cleaned.bam jobid: 120 wildcards: sample=PCC1_VZV_20_2 resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/PCC1_VZV_20_5.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/PCC1_VZV_20_5.bwa_depleted.bam, tmp/01_cleaned/PCC1_VZV_20_5.bmtagger_depleted.bam, tmp/01_cleaned/PCC1_VZV_20_5.rmdup.bam, data/01_cleaned/PCC1_VZV_20_5.cleaned.bam jobid: 122 wildcards: sample=PCC1_VZV_20_5 resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/PCC1_VZV_60_4.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/PCC1_VZV_60_4.bwa_depleted.bam, tmp/01_cleaned/PCC1_VZV_60_4.bmtagger_depleted.bam, tmp/01_cleaned/PCC1_VZV_60_4.rmdup.bam, data/01_cleaned/PCC1_VZV_60_4.cleaned.bam jobid: 126 wildcards: sample=PCC1_VZV_60_4 resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/VZV_60S.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/VZV_60S.bwa_depleted.bam, tmp/01_cleaned/VZV_60S.bmtagger_depleted.bam, tmp/01_cleaned/VZV_60S.rmdup.bam, data/01_cleaned/VZV_60S.cleaned.bam jobid: 114 wildcards: sample=VZV_60S resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/VZV_20S.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/VZV_20S.bwa_depleted.bam, tmp/01_cleaned/VZV_20S.bmtagger_depleted.bam, tmp/01_cleaned/VZV_20S.rmdup.bam, data/01_cleaned/VZV_20S.cleaned.bam jobid: 110 wildcards: sample=VZV_20S resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/PCC1_VZV_20_1.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/PCC1_VZV_20_1.bwa_depleted.bam, tmp/01_cleaned/PCC1_VZV_20_1.bmtagger_depleted.bam, tmp/01_cleaned/PCC1_VZV_20_1.rmdup.bam, data/01_cleaned/PCC1_VZV_20_1.cleaned.bam jobid: 118 wildcards: sample=PCC1_VZV_20_1 resources: mem_mb=15000, threads=15 [Thu Apr 3 11:52:51 2025] rule depletion: input: data/00_raw/PCC1_VZV_60_1.bam, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/hg19.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA.srprism.ssd, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.bitmask, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssa, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.imp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.idx, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.map, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ss, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.amp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.rmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.pmp, /user-data/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3.srprism.ssd, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus.nin, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nhr, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nsq, /user-data/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters.nin, sabeti-public-dbs/bwa/hg19.bwt, sabeti-public-dbs/bwa/hg19.amb, sabeti-public-dbs/bwa/hg19.ann, sabeti-public-dbs/bwa/hg19.pac, sabeti-public-dbs/bwa/hg19.sa output: tmp/01_cleaned/PCC1_VZV_60_1.bwa_depleted.bam, tmp/01_cleaned/PCC1_VZV_60_1.bmtagger_depleted.bam, tmp/01_cleaned/PCC1_VZV_60_1.rmdup.bam, data/01_cleaned/PCC1_VZV_60_1.cleaned.bam jobid: 124 wildcards: sample=PCC1_VZV_60_1 resources: mem_mb=15000, threads=15 Job counts: count jobs 1 depletion 1
H65.3 G and J35.2 G
Am 23. Juli um 9 Uhr
H65.3 G – Chronischer Seromukotympanon
- Das bedeutet: Chronischer Paukenerguss (Flüssigkeit hinter dem Trommelfell, oft ohne akute Entzündung).
- Kommt bei Kindern oft vor und kann das Hören beeinträchtigen.
- Häufige Ursache für die Notwendigkeit von Paukenröhrchen (TT).
J35.2 G – Hypertrophie der Tonsillen (vergrößerte Mandeln)
- Bedeutet: Die Mandeln sind vergrößert.
- Kann z. B. mit Atemproblemen, Schnarchen oder chronischen Infekten zusammenhängen.
- Oft wird in so einem Fall überlegt, ob die Mandeln entfernt werden sollen.
腺样体 (Adenoid) vs. 腭扁桃体 (Palatine Tonsils) 对比解析
解剖位置结构 位置 可视化难度
腺样体 鼻咽顶部(鼻腔后方) 不可直视,需内窥镜或影像检查
腭扁桃体 口咽两侧(张嘴可见) 直接可见(如咽部检查)
-
生理功能
共同点:
均属「Waldeyer淋巴环」组成部分,是儿童免疫防御的第一道防线(捕获吸入/摄入的病原体)。 青春期后逐渐萎缩(成人通常功能退化)。
差异:
腺样体:侧重过滤空气传播病原体(如细菌、病毒)。 腭扁桃体:更多接触食物/飞沫中的病原体。
-
常见病变与症状
问题 腺样体肥大 腭扁桃体肥大/炎症
典型症状 鼻塞、口呼吸、睡眠打鼾/呼吸暂停 咽痛、吞咽困难、发热(急性期)
并发症 中耳炎(咽鼓管阻塞)、腺样体面容 扁桃体周围脓肿、肾炎(链球菌感染)
手术指征 阻塞性睡眠呼吸暂停(OSA) 反复感染(年≥5次)或OSA
-
手术干预
腺样体切除术(Adenoidectomy):
经口腔或鼻腔内镜操作,无外部伤口。 恢复期约1周,术后可能鼻音暂时性加重。
扁桃体切除术(Tonsillectomy):
全身麻醉下经口切除,术后咽痛明显(持续7-10天)。 需警惕术后出血风险(尤其7天内)。
-
影像学对比
腺样体:需侧位X光或鼻咽镜评估肥大程度(A/N比值>0.6为病理性)。
腭扁桃体:临床视诊即可分级(如Brodsky评分)。
注:两者可同时肥大(称「腺样体-扁桃体肥大」),需综合评估手术必要性。
Mandelklappung 具体指哪个结构?
答案: 在德语医学术语中,Mandelklappung(扁桃体拍击音)特指 腭扁桃体(Gaumenmandeln) 的异常振动,而非腺样体(Rachenmandel)。
详细解释
-
解剖定位
涉及的扁桃体:
腭扁桃体(Gaumenmandeln):位于口咽两侧,张嘴可见(如下图👇)。 腺样体(Rachenmandel):位于鼻咽顶部,无法直接观察。 Mandelklappung 的振动源:肥大的 腭扁桃体 在呼吸时相互碰撞或拍击咽后壁。
-
为何不是腺样体?
腺样体位置隐蔽:位于鼻咽部,不与对侧结构直接接触,无法产生拍击音。
症状差异:
腺样体肥大 → 鼻塞、口呼吸(无声音特征)。 腭扁桃体肥大 → 吞咽障碍、Mandelklappung(可闻及振动音)。
-
临床意义
提示严重肥大:需评估是否需 扁桃体切除术(Tonsillektomie)。
相关诊断:
阻塞性睡眠呼吸暂停(OSA) 慢性扁桃体炎(chronische Tonsillitis)
-
与其他术语的区分
德语术语 对应结构 是否涉及 Mandelklappung?
Gaumenmandeln 腭扁桃体 ✅ 是(主要来源)
Rachenmandel 腺样体 ❌ 否
Zungengrundmandeln 舌根扁桃体 ❌ 否
示例(医生记录)
"Patient zeigt deutliche Mandelklappung bei Inspiration, bedingt durch hypertrophe Gaumenmandeln. Tonsillektomie empfohlen."
(患者吸气时出现明显扁桃体拍击音,由腭扁桃体肥大引起,建议行扁桃体切除术。)
总结:若听到「Mandelklappung」,一定是 腭扁桃体(Gaumenmandeln) 的问题,需结合喉镜或睡眠监测进一步评估!
Sollen zwei key words erscheinen: Mandelklappung oder TT
Unter dem Punkt Diagnose steht zwei Zeichen. Ich habe gefragt, was die zwei Zeichen bedeuten. Das eine ist der chronische Paukenerguss und das andere die vergrößerten Mandeln. Für den chronischen Paukenerguss braucht man Paukenröhrchen, also eine TT-Operation.
Die „TT-Operation“ bezieht sich auf eine „Tympanostomie-Tuben-Operation“. Dabei handelt es sich um einen chirurgischen Eingriff, bei dem kleine Röhrchen (Paukenröhrchen) in das Trommelfell eingesetzt werden, um den Druck im Mittelohr auszugleichen und Flüssigkeit abfließen zu lassen, um chronischen Paukenerguss zu behandeln. Wenn du diesen Begriff in einem Gespräch verwendest, könntest du vielleicht auch einfach sagen: „Für den chronischen Paukenerguss braucht man eine TT-Operation, bei der Paukenröhrchen ins Ohr eingesetzt werden.“ Das macht es für jemanden, der mit medizinischen Begriffen weniger vertraut ist, verständlicher.
“TT手术”(Tympanostomy Tube Surgery,鼓室引流管手术)是一种用于治疗慢性积液性中耳炎(即慢性耳朵积液)的手术。在这个手术中,医生会在耳膜上开一个小孔,然后放入一个小的管子(称为鼓室引流管),以帮助排出中耳中的积液,并使耳朵内的压力恢复正常。 这种手术通常用于那些耳朵反复感染或有持续性耳液积聚的儿童,以减轻症状并预防听力问题。通过这项手术,液体能够流出耳朵,避免积液造成的耳部感染。
Mantelkappung
Gaumenmandeln
Mantelkappenröhrchen oder Form von TT
Mandeloperation / (Teil)Entfernung der Gaumenmandeln ©Monks – Ärzte im Netz GmbH
Es gibt zwei Formen der Mandeloperation: Zum einen kann man die Gaumenmandeln komplett entfernen (Tonsillektomie), eine Maßnahme, die vor allem bei Kleinkindern mit chronischen Mandelentzündungen sowie bei älteren Kindern, Jugendlichen und Erwachsenen zum Einsatz kommt. Zum anderen besteht die Möglichkeit einer Teilentfernung der Gaumenmandeln (Tonsillotomie), die bei Kindern unter 6 Jahren in vielen Fällen sinnvoll ist. Vollständige Entfernung der Gaumenmandeln (Tonsillektomie)
Bei einer Tonsillektomie werden die beiden Gaumenmandeln mit Hilfe chirurgischer Instrumente aus ihrem Bett geschält. Die Operation wird i.d.R. unter kurzer Vollnarkose ambulant oder stationär durchgeführt. Eine Tonsillektomie ist ein Routineeingriff im HNO-Bereich und insgesamt eine der am meisten durchgeführten Operationen überhaupt.
Eine relativ häufige Komplikation (1-6%) sind Nachblutungen, die entweder am Operationstag bzw. einen Tag danach oder nach etwa einer Woche auftreten, wenn sich der Schorf ablöst. Um speziell Nachblutungen zu vermeiden, empfehlen sich in den Tagen nach der Operation kalte Getränke und weiche Nahrung. Körperliche Anstrengung und heißes Baden oder Duschen sind für mindestens zwei Wochen tabu. Nachblutungen gehören in sofortige ärztliche Behandlung!
Im Gegensatz zu Kindern, die Mandel-Operation erstaunlich gut „wegstecken”, empfinden Erwachsene eine Tonsillektomie oft als sehr schmerzhaft und leiden mitunter 2 bis 4 Wochen an starken Schmerzen. Teilentfernung der Gaumenmandeln (Tonsillotomie)
Neben einer vollständigen Entfernung der Gaumenmandeln gibt es auch die Möglichkeit einer Teilentfernung, einer so genannten Tonsillotomie. Diese Operation wird vor allem bei Kindern zwischen dem 3. und 6. Lebensjahr durchgeführt, wenn die Mandeln wegen ihrer abnormen Größe zu Atem- und Schluckbeschwerden führen, aber kein chronischer Entzündungsherd sind. Bei einer Tonsillotomie wird ein Teil der Gaumenmandeln ambulant (z.B. per Laser oder mittels Radiofrequenz) abgetragen.
Neben dem Vorteil, dass die vor allem für Kinder wichtige Abwehrfunktion der Mandeln erhalten bleibt, zeichnet sich diese Methode auch durch deutlich weniger Schmerzen und Nachblutungen aus. Im Falle einer chronischen Mandelentzündung macht die Tonsillotomie i.d.R. keinen Sinn, da ein chronisch entzündetes Organ im Körper bliebe.
Gesetzlich Versicherte sollten die Kostenübernahme für eine Tonsillotomie vorab mit dem Operateur und ihrer Krankenkasse klären.
扁桃体手术 / (部分)切除腭扁桃体
©Monks – Ärzte im Netz GmbH
扁桃体手术有两种形式: 一种是完全切除腭扁桃体(称为扁桃体切除术,Tonsillektomie),这种方法主要适用于患有慢性扁桃体炎的幼儿,以及年长儿童、青少年和成人。 另一种是部分切除腭扁桃体(称为扁桃体部分切除术,Tonsillotomie),在6岁以下儿童中,很多情况下这种方式是合理且有效的。
🔹 完全切除腭扁桃体(Tonsillektomie)
在扁桃体切除术中,医生会用外科手术工具将两侧的腭扁桃体从组织床中剥离出来。 手术通常在短时间的全身麻醉下进行,可作为门诊手术或住院手术完成。 这是耳鼻喉科中一种常规操作,甚至是最常见的外科手术之一。
一个相对常见的并发症是术后出血(发生率约为1%-6%),可能在手术当天或次日,也可能在术后一周左右(当痂脱落时)发生。 为避免术后出血,术后几天建议饮用冷饮、进食软食,并且避免剧烈运动和热水澡,持续至少两周。 ⚠️ 一旦出现出血,必须立即就医!
与孩子通常恢复较快不同,成人往往会觉得扁桃体切除术非常疼痛,而且术后可能持续2至4周感到严重不适。
🔹 部分切除腭扁桃体(Tonsillotomie)
除了完全切除腭扁桃体之外,还有一种选择是部分切除,即扁桃体部分切除术(Tonsillotomie)。 这种手术主要用于3至6岁之间的儿童,当孩子因扁桃体体积过大而出现呼吸或吞咽困难,但没有慢性感染时适用。 手术通常在门诊进行,可通过激光或射频技术将部分腭扁桃体组织去除。
这种方法的优点是:
-
可以保留扁桃体的免疫功能(对儿童尤其重要);
-
术后疼痛和出血的风险较低。
⚠️ 如果存在慢性扁桃体炎,Tonsillotomie一般不适用,因为体内仍会保留一个慢性感染灶。
Mapping of reads to selected viruses in DAMIAN results (version 2)
-
Prepare input raw data
# -- Ringversuch -- ~/DATA/Data_Damian/241213_VH00358_120_AAG523FM5_Ringversuch ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20579/01_RV1_DNA_S1_R1_001.fastq.gz RV1_DNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20579/01_RV1_DNA_S1_R2_001.fastq.gz RV1_DNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20580/02_RV2_DNA_S2_R1_001.fastq.gz RV2_DNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20580/02_RV2_DNA_S2_R2_001.fastq.gz RV2_DNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20581/03_RV3_DNA_S3_R1_001.fastq.gz RV3_DNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20581/03_RV3_DNA_S3_R2_001.fastq.gz RV3_DNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20582/04_RV4_DNA_S4_R1_001.fastq.gz RV4_DNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20582/04_RV4_DNA_S4_R2_001.fastq.gz RV4_DNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20583/05_RV5_DNA_S5_R1_001.fastq.gz RV5_DNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20583/05_RV5_DNA_S5_R2_001.fastq.gz RV5_DNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20584/06_RV6_DNA_S6_R1_001.fastq.gz RV6_DNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20584/06_RV6_DNA_S6_R2_001.fastq.gz RV6_DNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20585/07_RV1_RNA_S7_R1_001.fastq.gz RV1_RNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20585/07_RV1_RNA_S7_R2_001.fastq.gz RV1_RNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20586/08_RV2_RNA_S8_R1_001.fastq.gz RV2_RNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20586/08_RV2_RNA_S8_R2_001.fastq.gz RV2_RNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20587/09_RV3_RNA_S9_R1_001.fastq.gz RV3_RNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20587/09_RV3_RNA_S9_R2_001.fastq.gz RV3_RNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20588/10_RV4_RNA_S10_R1_001.fastq.gz RV4_RNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20588/10_RV4_RNA_S10_R2_001.fastq.gz RV4_RNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20589/11_RV5_RNA_S11_R1_001.fastq.gz RV5_RNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20589/11_RV5_RNA_S11_R2_001.fastq.gz RV5_RNA_R2.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20590/12_RV6_RNA_S12_R1_001.fastq.gz RV6_RNA_R1.fastq.gz ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20590/12_RV6_RNA_S12_R2_001.fastq.gz RV6_RNA_R2.fastq.gz
-
Prepare virus database and select 8 representatives for the eight given viruses from the database
# -- Download all genomes -- # enterovirus D68 # HSV-1 # HSV-2 # Influenza A H1N1 # Cytomegalovirus AD169 (The genome size of Human herpesvirus 5 (HHV-5) — more commonly known as Cytomegalovirus (CMV)) # Influenza A H3N2 # Monkeypox # HIV-1 esearch -db nucleotide -query "txid42789[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_42789_ncbi.fasta python ~/Scripts/filter_fasta.py genome_42789_ncbi.fasta complete_42789_ncbi.fasta #899 esearch -db nucleotide -query "txid10298[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10298_ncbi.fasta python ~/Scripts/filter_fasta.py genome_10298_ncbi.fasta complete_10298_ncbi.fasta #162 esearch -db nucleotide -query "txid10310[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10310_ncbi.fasta python ~/Scripts/filter_fasta.py genome_10310_ncbi.fasta complete_10310_ncbi.fasta #33 esearch -db nucleotide -query "txid1323429[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_1323429_ncbi.fasta python ~/Scripts/filter_fasta2.py genome_1323429_ncbi.fasta complete_1323429_ncbi.fasta #465 esearch -db nucleotide -query "txid10360[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10360_ncbi.fasta python ~/Scripts/filter_fasta2.py genome_10360_ncbi.fasta complete_10360_ncbi.fasta #1 esearch -db nucleotide -query "txid41857[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_41857_ncbi.fasta python ~/Scripts/filter_fasta2.py genome_41857_ncbi.fasta complete_41857_ncbi.fasta #120 esearch -db nucleotide -query "txid10244[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10244_ncbi.fasta python ~/Scripts/filter_fasta.py genome_10244_ncbi.fasta complete_10244_ncbi.fasta #2525 esearch -db nucleotide -query "txid11676[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_11676_ncbi.fasta python ~/Scripts/filter_fasta.py genome_11676_ncbi.fasta complete_11676_ncbi.fasta #485995-->7416 # ---- Alternatively, using ENA instead to download the genomes ---- # https://www.ebi.ac.uk/ena/browser/view/11676 (1138065 records) # #Click "Sequence" and download "Counts" (1132648) and "Taxon descendants count" (1138065) if there is enough time! Downloading time points is 09.04.2025. # python ~/Scripts/filter_fasta.py ena_11676_sequence.fasta complete_11676_ena.fasta #1138065-->???? # Virus Name NCBI TaxID # ------------------------ # Enterovirus D68 42789 >PQ895337.1 Enterovirus D68 isolate SH2024-25870 # HSV-1 (Herpes Simplex Virus 1) 10298 >PQ569920.1 Human alphaherpesvirus 1 isolate MacIntyre, complete genome # HSV-2 (Herpes Simplex Virus 2) 10310 >OM370995.1 Human alphaherpesvirus 2 strain G, complete genome samtools faidx complete_42789_ncbi.fasta PQ895337.1 > Enterovirus_D68_isolate_SH2024-25870.fasta samtools faidx complete_10298_ncbi.fasta PQ569920.1 > HSV-1_isolate_MacIntyre.fasta samtools faidx complete_10310_ncbi.fasta OM370995.1 > HSV-2_strain_G.fasta # Influenza A virus (H1N1) 1323429 # The Influenza A virus (H1N1) genome is composed of eight single-stranded negative-sense RNA segments, and the total genome size is approximately 13,500 nucleotides (13.5 kb). # Segment Gene Protein Product(s) Approx. Length (nt) # 1 PB2 Polymerase basic 2 ~2,341 # 2 PB1 Polymerase basic 1, PB1-F2 ~2,341 # 3 PA Polymerase acidic ~2,233 # 4 HA Hemagglutinin ~1,778 # 5 NP Nucleoprotein ~1,565 # 6 NA Neuraminidase ~1,413 # 7 M Matrix proteins (M1, M2) ~1,027 # 8 NS Nonstructural (NS1, NS2) ~890 # >LC662544.1 Influenza A virus (H1N1) A/PR/8/34 NEP, NS1 genes for nonstructural protein 2, nonstructural protein 1, complete cds # >LC662543.1 Influenza A virus (H1N1) A/PR/8/34 M2, M1 genes for matrix protein 2, matrix protein 1, complete cds # >LC662542.1 Influenza A virus (H1N1) A/PR/8/34 NA gene for neuraminidase, complete cds # >LC662541.1 Influenza A virus (H1N1) A/PR/8/34 NP gene for nucleoprotein, complete cds # >LC662540.1 Influenza A virus (H1N1) A/PR/8/34 HA gene for haemagglutinin, complete cds # >LC662539.1 Influenza A virus (H1N1) A/PR/8/34 PA, PA-X genes for polymerase PA, PA-X protein, complete cds # >LC662538.1 Influenza A virus (H1N1) A/PR/8/34 PB1, PB1-F2 genes for polymerase PB1, PB1-F2 protein, complete cds # >LC662537.1 Influenza A virus (H1N1) A/PR/8/34 PB2 gene for polymerase PB2, complete cds samtools faidx complete_1323429_ncbi.fasta LC662537.1 > H1N1_A-PR-8-34_PB2.fasta samtools faidx complete_1323429_ncbi.fasta LC662538.1 > H1N1_A-PR-8-34_PB1.fasta samtools faidx complete_1323429_ncbi.fasta LC662539.1 > H1N1_A-PR-8-34_PA.fasta samtools faidx complete_1323429_ncbi.fasta LC662540.1 > H1N1_A-PR-8-34_HA.fasta samtools faidx complete_1323429_ncbi.fasta LC662541.1 > H1N1_A-PR-8-34_NP.fasta samtools faidx complete_1323429_ncbi.fasta LC662542.1 > H1N1_A-PR-8-34_NA.fasta samtools faidx complete_1323429_ncbi.fasta LC662543.1 > H1N1_A-PR-8-34_M.fasta samtools faidx complete_1323429_ncbi.fasta LC662544.1 > H1N1_A-PR-8-34_NS.fasta # Human cytomegalovirus AD169 10360 # Influenza A virus (H3N2) 41857 # >LC817411.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 8, complete sequence # >LC817410.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 7, complete sequence # >LC817409.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 6, complete sequence # >LC817408.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 5, complete sequence # >LC817407.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 4, complete sequence # >LC817406.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 3, complete sequence # >LC817405.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 2, complete sequence # >LC817404.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 1, complete sequence samtools faidx complete_41857_ncbi.fasta LC817404.1 > H3N2_A-Fukushima-OR808-2023_PB2.fasta samtools faidx complete_41857_ncbi.fasta LC817405.1 > H3N2_A-Fukushima-OR808-2023_PB1.fasta samtools faidx complete_41857_ncbi.fasta LC817406.1 > H3N2_A-Fukushima-OR808-2023_PA.fasta samtools faidx complete_41857_ncbi.fasta LC817407.1 > H3N2_A-Fukushima-OR808-2023_HA.fasta samtools faidx complete_41857_ncbi.fasta LC817408.1 > H3N2_A-Fukushima-OR808-2023_NP.fasta samtools faidx complete_41857_ncbi.fasta LC817409.1 > H3N2_A-Fukushima-OR808-2023_NA.fasta samtools faidx complete_41857_ncbi.fasta LC817410.1 > H3N2_A-Fukushima-OR808-2023_M.fasta samtools faidx complete_41857_ncbi.fasta LC817411.1 > H3N2_A-Fukushima-OR808-2023_NS.fasta # Monkeypox virus 10244: >OP689666.1 Monkeypox virus isolate MPXV/Germany/2022/RKI513, complete genome samtools faidx complete_10244_ncbi.fasta OP689666.1 > Monkeypox_isolate_MPXV-Germany-2022-RKI513.fasta # Human immunodeficiency virus 1 11676: >AJ866558.1 Human immunodeficiency virus 1 complete genome, isolate 01IC-PCI127 samtools faidx complete_11676_ncbi.fasta AJ866558.1 > HIV-1_isolate_01IC-PCI127.fasta # -- Selected genomes saved in the fasta-files -- # Enterovirus_D68_isolate_SH2024-25870.fasta (7391 nt) # HSV-1_isolate_MacIntyre.fasta (151817 nt) # HSV-2_strain_G.fasta (155498 nt) # H1N1_A-PR-8-34_PB2.fasta (2341 nt) # H1N1_A-PR-8-34_PB1.fasta (2341 nt) # H1N1_A-PR-8-34_PA.fasta (2233 nt) # H1N1_A-PR-8-34_HA.fasta (1775 nt) # H1N1_A-PR-8-34_NP.fasta (1565 nt) # H1N1_A-PR-8-34_NA.fasta (1413 nt) # H1N1_A-PR-8-34_M.fasta (1027 nt) # H1N1_A-PR-8-34_NS.fasta (890 nt) # Human_cytomegalovirus_strain_AD169.fasta (229354 nt) # H3N2_A-Fukushima-OR808-2023_PB2.fasta (2301 nt) # H3N2_A-Fukushima-OR808-2023_PB1.fasta (2316 nt) # H3N2_A-Fukushima-OR808-2023_PA.fasta (2208 nt) # H3N2_A-Fukushima-OR808-2023_HA.fasta (1722 nt) # H3N2_A-Fukushima-OR808-2023_NP.fasta (1536 nt) # H3N2_A-Fukushima-OR808-2023_NA.fasta (1440 nt) # H3N2_A-Fukushima-OR808-2023_M.fasta (1002 nt) # H3N2_A-Fukushima-OR808-2023_NS.fasta (865 nt) # Monkeypox_isolate_MPXV-Germany-2022-RKI513.fasta (197140 nt) # HIV-1_isolate_01IC-PCI127.fasta (9752 nt)
-
(Optional) Run the first round of vrap (–virus==viruses_selected.fasta)
ln -s ~/Tools/vrap/ . mamba activate /home/jhuang/miniconda3/envs/vrap cd ~/DATA/Data_Damian/vrap_Ringversuch cat complete_10244_ncbi.fasta complete_10298_ncbi.fasta complete_10310_ncbi.fasta complete_1323429_ncbi.fasta complete_10360_ncbi.fasta complete_41857_ncbi.fasta complete_10244_ncbi.fasta complete_11676_ncbi.fasta > viruses_selected.fasta #Run vrap (first round): replace --virus to the specific taxonomy (e.g. viruses_selected.fasta) --> change virus_user_db --> specific_bacteria_user_db (vrap) for sample in RV1_DNA RV2_DNA RV3_DNA RV4_DNA RV5_DNA RV6_DNA RV1_RNA RV2_RNA RV3_RNA RV4_RNA RV5_RNA RV6_RNA; do vrap/vrap.py -1 ${sample}_R1.fastq.gz -2 ${sample}_R2.fastq.gz -o vrap_${sample} --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Damian/vrap_Ringversuch/viruses_selected.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g done
-
Run the second round of vrap (–host==${virus}.fasta)
cat Enterovirus_D68_isolate_SH2024-25870.fasta HSV-1_isolate_MacIntyre.fasta HSV-2_strain_G.fasta H1N1_A-PR-8-34_PB2.fasta H1N1_A-PR-8-34_PB1.fasta H1N1_A-PR-8-34_PA.fasta H1N1_A-PR-8-34_HA.fasta H1N1_A-PR-8-34_NP.fasta H1N1_A-PR-8-34_NA.fasta H1N1_A-PR-8-34_M.fasta H1N1_A-PR-8-34_NS.fasta Human_cytomegalovirus_strain_AD169.fasta H3N2_A-Fukushima-OR808-2023_PB2.fasta H3N2_A-Fukushima-OR808-2023_PB1.fasta H3N2_A-Fukushima-OR808-2023_PA.fasta H3N2_A-Fukushima-OR808-2023_HA.fasta H3N2_A-Fukushima-OR808-2023_NP.fasta H3N2_A-Fukushima-OR808-2023_NA.fasta H3N2_A-Fukushima-OR808-2023_M.fasta H3N2_A-Fukushima-OR808-2023_NS.fasta Monkeypox_isolate_MPXV-Germany-2022-RKI513.fasta HIV-1_isolate_01IC-PCI127.fasta > viruses_representative.fasta # Run vrap (second round): selecte some representative viruses from the generated Excel-files generated by the last step as --host (vrap) for virus in Enterovirus_D68_isolate_SH2024-25870 HSV-1_isolate_MacIntyre HSV-2_strain_G H1N1_A-PR-8-34_PB2 H1N1_A-PR-8-34_PB1 H1N1_A-PR-8-34_PA H1N1_A-PR-8-34_HA H1N1_A-PR-8-34_NP H1N1_A-PR-8-34_NA H1N1_A-PR-8-34_M H1N1_A-PR-8-34_NS Human_cytomegalovirus_strain_AD169 H3N2_A-Fukushima-OR808-2023_PB2 H3N2_A-Fukushima-OR808-2023_PB1 H3N2_A-Fukushima-OR808-2023_PA H3N2_A-Fukushima-OR808-2023_HA H3N2_A-Fukushima-OR808-2023_NP H3N2_A-Fukushima-OR808-2023_NA H3N2_A-Fukushima-OR808-2023_M H3N2_A-Fukushima-OR808-2023_NS Monkeypox_isolate_MPXV-Germany-2022-RKI513 HIV-1_isolate_01IC-PCI127; do for sample in RV1_DNA RV2_DNA RV3_DNA RV4_DNA RV5_DNA RV6_DNA RV1_RNA RV2_RNA RV3_RNA RV4_RNA RV5_RNA RV6_RNA; do vrap/vrap_until_bowtie2.py -1 ${sample}_R1.fastq.gz -2 ${sample}_R2.fastq.gz -o vrap_${sample}_on_${virus} --host /home/jhuang/DATA/Data_Damian/vrap_Ringversuch/${virus}.fasta -t 100 -l 200 --gbt2 --noblast done done
-
Generate the mapping statistics for the sam-files generated from last step
#Enterovirus_D68_isolate_SH2024-25870 #for virus in HSV-1_isolate_MacIntyre HSV-2_strain_G H1N1_A-PR-8-34_PB2 H1N1_A-PR-8-34_PB1 H1N1_A-PR-8-34_PA H1N1_A-PR-8-34_HA H1N1_A-PR-8-34_NP H1N1_A-PR-8-34_NA H1N1_A-PR-8-34_M H1N1_A-PR-8-34_NS Human_cytomegalovirus_strain_AD169; do for virus in H3N2_A-Fukushima-OR808-2023_PB2 H3N2_A-Fukushima-OR808-2023_PB1 H3N2_A-Fukushima-OR808-2023_PA H3N2_A-Fukushima-OR808-2023_HA H3N2_A-Fukushima-OR808-2023_NP H3N2_A-Fukushima-OR808-2023_NA H3N2_A-Fukushima-OR808-2023_M H3N2_A-Fukushima-OR808-2023_NS Monkeypox_isolate_MPXV-Germany-2022-RKI513 HIV-1_isolate_01IC-PCI127; do for sample in RV1_DNA RV2_DNA RV3_DNA RV4_DNA RV5_DNA RV6_DNA RV1_RNA RV2_RNA RV3_RNA RV4_RNA RV5_RNA RV6_RNA; do echo "-----${sample}_on_${virus}------" >> LOG_mapping cd vrap_${sample}_on_${virus}/bowtie # Rename and convert SAM to BAM mv mapped mapped.sam 2>> ../../LOG_mapping samtools view -S -b mapped.sam > mapped.bam 2>> ../../LOG_mapping samtools sort mapped.bam -o mapped_sorted.bam 2>> ../../LOG_mapping samtools index mapped_sorted.bam 2>> ../../LOG_mapping # Write flagstat output to log (go up two levels to write correctly) samtools flagstat mapped_sorted.bam >> ../../LOG_mapping 2>&1 cd ../.. done done #draw some plots for some representative isolates which found in the first round (see Excel-file). samtools depth -m 0 -a mapped_sorted.bam > coverage.txt grep "PQ895337.1" coverage.txt > PQ895337_coverage.txt import pandas as pd import matplotlib.pyplot as plt import sys import os import re # Check for required arguments if len(sys.argv) != 3: print("Usage: python script.py
“) sys.exit(1) # Parse arguments coverage_file = sys.argv[1] genome_length = int(sys.argv[2]) # Extract accession from file name (e.g., “PQ895337” from “PQ895337_coverage.txt”) file_name = os.path.basename(coverage_file) accession_match = re.match(r”([A-Z0-9]+)_coverage\.txt”, file_name) accession = accession_match.group(1) if accession_match else “” # Extract sample name from the grandparent directory of the file sample_dir = os.path.basename(os.path.dirname(os.path.dirname(coverage_file))) sample_name = re.sub(r’^vrap_’, ”, sample_dir).replace(‘_’, ‘ ‘) # Create title and filename plot_title = f”{sample_name} ({accession})” output_filename = plot_title.replace(” “, “_”) + “.png” # Load coverage data df = pd.read_csv( coverage_file, sep=”\t”, header=None, names=[“chr”, “pos”, “coverage”] ) # Create a full genome position index full_index = pd.DataFrame({‘pos’: range(1, genome_length + 1)}) # Merge coverage data with full index df_full = pd.merge(full_index, df[[‘pos’, ‘coverage’]], on=’pos’, how=’left’) df_full[‘coverage’].fillna(0, inplace=True) # Plot plt.figure(figsize=(10, 4)) plt.plot(df_full[“pos”], df_full[“coverage”], color=”blue”, linewidth=0.5) plt.xlabel(“Genomic Position”) plt.ylabel(“Coverage Depth”) plt.title(plot_title) plt.tight_layout() # Save plot to file plt.savefig(output_filename, dpi=150) print(f”Plot saved to {output_filename}”) # Optionally show the plot # plt.show() -
Report
Subject: Mapping Results and Selected Reference Genomes
Dear XXXX,
Please find below the results of the mapping analysis. For each virus you provided, I have selected a representative reference isolate, listed as follows:
Selected Reference Isolates
Enterovirus D68
• PQ895337.1 – Enterovirus D68 isolate SH2024-25870
HSV-1 (Herpes Simplex Virus 1)
• PQ569920.1 – Human alphaherpesvirus 1 isolate MacIntyre, complete genome
HSV-2 (Herpes Simplex Virus 2)
• OM370995.1 – Human alphaherpesvirus 2 strain G, complete genome
Influenza A Virus (H1N1)
• LC662537.1 – PB2 gene, complete CDS
• LC662538.1 – PB1 and PB1-F2 genes, complete CDS
• LC662539.1 – PA and PA-X genes, complete CDS
• LC662540.1 – HA gene, complete CDS
• LC662541.1 – NP gene, complete CDS
• LC662542.1 – NA gene, complete CDS
• LC662543.1 – M1 and M2 genes, complete CDS
• LC662544.1 – NS1 and NEP genes, complete CDS
Cytomegalovirus (strain AD169)
• X17403.1 – Human cytomegalovirus strain AD169, complete genome
Influenza A Virus (H3N2)
• LC817404.1 – PB2 gene
• LC817405.1 – PB1 genes
• LC817406.1 – PA genes
• LC817407.1 – HA gene
• LC817408.1 – NP gene
• LC817409.1 – NA gene
• LC817410.1 – M genes
• LC817411.1 – NS genes
Monkeypox Virus
• OP689666.1 – Isolate MPXV/Germany/2022/RKI513, complete genome
Human Immunodeficiency Virus 1 (HIV-1)
• AJ866558.1 – Isolate 01IC-PCI127, complete genome
Mapping Results
We mapped paired-end reads from 12 Ringversuch project samples against the selected reference genomes.
Below are the mapping statistics for Enterovirus D68, HSV-1, HSV-2, and H1N1. Coverage plots are attached for all cases where the percentage of reads mapping to the reference genome is greater than 0.00%. Results for the remaining viruses will follow next week.
(* An asterisk indicates cases with non-zero mapping percentages.)
Mapping Statistics
Enterovirus D68 (SH2024-25870):
RV1_DNA: 0 (0.00%)
RV2_DNA: 0 (0.00%)
RV3_DNA: 0 (0.00%)
RV4_DNA: 0 (0.00%)
RV5_DNA: 0 (0.00%)
RV6_DNA: 0 (0.00%)
RV1_RNA: 66 (0.00%)
RV2_RNA: 55 (0.00%)
RV3_RNA: 15 (0.00%)
RV4_RNA: 1701 (0.02%) *
RV5_RNA: 26 (0.00%)
RV6_RNA: 35 (0.00%)
HSV-1 (isolate MacIntyre):
RV1_DNA: 387 (0.02%) *
RV2_DNA: 6232 (0.26%) *
RV3_DNA: 0 (0.00%)
RV4_DNA: 1443 (0.03%) *
RV5_DNA: 2 (0.00%)
RV6_DNA: 0 (0.00%)
RV1_RNA: 6 (0.00%)
RV2_RNA: 32 (0.00%)
RV3_RNA: 4 (0.00%)
RV4_RNA: 13 (0.00%)
RV5_RNA: 4 (0.00%)
RV6_RNA: 10 (0.00%)
HSV-2 (strain G):
RV1_DNA: 201 (0.01%) *
RV2_DNA: 376 (0.02%) *
RV3_DNA: 0 (0.00%)
RV4_DNA: 19670 (0.46%) *
RV5_DNA: 0 (0.00%)
RV6_DNA: 0 (0.00%)
RV1_RNA: 0 (0.00%)
RV2_RNA: 3 (0.00%)
RV3_RNA: 0 (0.00%)
RV4_RNA: 25 (0.00%)
RV5_RNA: 5 (0.00%)
RV6_RNA: 24 (0.00%)
Influenza A Virus (H1N1, A/PR/8/34):
RV1_DNA: 0 (0.00%)
RV2_DNA: 0 (0.00%)
RV3_DNA: 0 (0.00%)
RV4_DNA: 0 (0.00%)
RV5_DNA: 0 (0.00%)
RV6_DNA: 0 (0.00%)
RV1_RNA: 0 (0.00%)
RV2_RNA: 0 (0.00%)
RV3_RNA: 0 (0.00%)
RV4_RNA: 13 + 354 (0.00%)
RV5_RNA: 0 (0.00%)
RV6_RNA: 0 (0.00%)