Mapping of reads to selected viruses in DAMIAN results (version 2)

gene_x 0 like s 74 view s

Tags: pipeline

RV4_DNA_on_HSV-2_strain_G_OM370995

  1. Prepare input raw data

    1. # -- Ringversuch --
    2. ~/DATA/Data_Damian/241213_VH00358_120_AAG523FM5_Ringversuch
    3. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20579/01_RV1_DNA_S1_R1_001.fastq.gz RV1_DNA_R1.fastq.gz
    4. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20579/01_RV1_DNA_S1_R2_001.fastq.gz RV1_DNA_R2.fastq.gz
    5. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20580/02_RV2_DNA_S2_R1_001.fastq.gz RV2_DNA_R1.fastq.gz
    6. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20580/02_RV2_DNA_S2_R2_001.fastq.gz RV2_DNA_R2.fastq.gz
    7. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20581/03_RV3_DNA_S3_R1_001.fastq.gz RV3_DNA_R1.fastq.gz
    8. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20581/03_RV3_DNA_S3_R2_001.fastq.gz RV3_DNA_R2.fastq.gz
    9. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20582/04_RV4_DNA_S4_R1_001.fastq.gz RV4_DNA_R1.fastq.gz
    10. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20582/04_RV4_DNA_S4_R2_001.fastq.gz RV4_DNA_R2.fastq.gz
    11. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20583/05_RV5_DNA_S5_R1_001.fastq.gz RV5_DNA_R1.fastq.gz
    12. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20583/05_RV5_DNA_S5_R2_001.fastq.gz RV5_DNA_R2.fastq.gz
    13. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20584/06_RV6_DNA_S6_R1_001.fastq.gz RV6_DNA_R1.fastq.gz
    14. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20584/06_RV6_DNA_S6_R2_001.fastq.gz RV6_DNA_R2.fastq.gz
    15. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20585/07_RV1_RNA_S7_R1_001.fastq.gz RV1_RNA_R1.fastq.gz
    16. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20585/07_RV1_RNA_S7_R2_001.fastq.gz RV1_RNA_R2.fastq.gz
    17. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20586/08_RV2_RNA_S8_R1_001.fastq.gz RV2_RNA_R1.fastq.gz
    18. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20586/08_RV2_RNA_S8_R2_001.fastq.gz RV2_RNA_R2.fastq.gz
    19. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20587/09_RV3_RNA_S9_R1_001.fastq.gz RV3_RNA_R1.fastq.gz
    20. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20587/09_RV3_RNA_S9_R2_001.fastq.gz RV3_RNA_R2.fastq.gz
    21. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20588/10_RV4_RNA_S10_R1_001.fastq.gz RV4_RNA_R1.fastq.gz
    22. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20588/10_RV4_RNA_S10_R2_001.fastq.gz RV4_RNA_R2.fastq.gz
    23. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20589/11_RV5_RNA_S11_R1_001.fastq.gz RV5_RNA_R1.fastq.gz
    24. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20589/11_RV5_RNA_S11_R2_001.fastq.gz RV5_RNA_R2.fastq.gz
    25. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20590/12_RV6_RNA_S12_R1_001.fastq.gz RV6_RNA_R1.fastq.gz
    26. ln ../241213_VH00358_120_AAG523FM5_Ringversuch/p20590/12_RV6_RNA_S12_R2_001.fastq.gz RV6_RNA_R2.fastq.gz
  2. Prepare virus database and select 8 representatives for the eight given viruses from the database

    1. # -- Download all genomes --
    2. # enterovirus D68
    3. # HSV-1
    4. # HSV-2
    5. # Influenza A H1N1
    6. # Cytomegalovirus AD169 (The genome size of Human herpesvirus 5 (HHV-5) — more commonly known as Cytomegalovirus (CMV))
    7. # Influenza A H3N2
    8. # Monkeypox
    9. # HIV-1
    10. esearch -db nucleotide -query "txid42789[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_42789_ncbi.fasta
    11. python ~/Scripts/filter_fasta.py genome_42789_ncbi.fasta complete_42789_ncbi.fasta #899
    12. esearch -db nucleotide -query "txid10298[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10298_ncbi.fasta
    13. python ~/Scripts/filter_fasta.py genome_10298_ncbi.fasta complete_10298_ncbi.fasta #162
    14. esearch -db nucleotide -query "txid10310[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10310_ncbi.fasta
    15. python ~/Scripts/filter_fasta.py genome_10310_ncbi.fasta complete_10310_ncbi.fasta #33
    16. esearch -db nucleotide -query "txid1323429[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_1323429_ncbi.fasta
    17. python ~/Scripts/filter_fasta2.py genome_1323429_ncbi.fasta complete_1323429_ncbi.fasta #465
    18. esearch -db nucleotide -query "txid10360[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10360_ncbi.fasta
    19. python ~/Scripts/filter_fasta2.py genome_10360_ncbi.fasta complete_10360_ncbi.fasta #1
    20. esearch -db nucleotide -query "txid41857[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_41857_ncbi.fasta
    21. python ~/Scripts/filter_fasta2.py genome_41857_ncbi.fasta complete_41857_ncbi.fasta #120
    22. esearch -db nucleotide -query "txid10244[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_10244_ncbi.fasta
    23. python ~/Scripts/filter_fasta.py genome_10244_ncbi.fasta complete_10244_ncbi.fasta #2525
    24. esearch -db nucleotide -query "txid11676[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_11676_ncbi.fasta
    25. python ~/Scripts/filter_fasta.py genome_11676_ncbi.fasta complete_11676_ncbi.fasta #485995-->7416
    26. # ---- Alternatively, using ENA instead to download the genomes ----
    27. # https://www.ebi.ac.uk/ena/browser/view/11676 (1138065 records)
    28. # #Click "Sequence" and download "Counts" (1132648) and "Taxon descendants count" (1138065) if there is enough time! Downloading time points is 09.04.2025.
    29. # python ~/Scripts/filter_fasta.py ena_11676_sequence.fasta complete_11676_ena.fasta #1138065-->????
    30. # Virus Name NCBI TaxID
    31. # ------------------------
    32. # Enterovirus D68 42789 >PQ895337.1 Enterovirus D68 isolate SH2024-25870
    33. # HSV-1 (Herpes Simplex Virus 1) 10298 >PQ569920.1 Human alphaherpesvirus 1 isolate MacIntyre, complete genome
    34. # HSV-2 (Herpes Simplex Virus 2) 10310 >OM370995.1 Human alphaherpesvirus 2 strain G, complete genome
    35. samtools faidx complete_42789_ncbi.fasta PQ895337.1 > Enterovirus_D68_isolate_SH2024-25870.fasta
    36. samtools faidx complete_10298_ncbi.fasta PQ569920.1 > HSV-1_isolate_MacIntyre.fasta
    37. samtools faidx complete_10310_ncbi.fasta OM370995.1 > HSV-2_strain_G.fasta
    38. # Influenza A virus (H1N1) 1323429
    39. # The Influenza A virus (H1N1) genome is composed of eight single-stranded negative-sense RNA segments, and the total genome size is approximately 13,500 nucleotides (13.5 kb).
    40. # Segment Gene Protein Product(s) Approx. Length (nt)
    41. # 1 PB2 Polymerase basic 2 ~2,341
    42. # 2 PB1 Polymerase basic 1, PB1-F2 ~2,341
    43. # 3 PA Polymerase acidic ~2,233
    44. # 4 HA Hemagglutinin ~1,778
    45. # 5 NP Nucleoprotein ~1,565
    46. # 6 NA Neuraminidase ~1,413
    47. # 7 M Matrix proteins (M1, M2) ~1,027
    48. # 8 NS Nonstructural (NS1, NS2) ~890
    49. # >LC662544.1 Influenza A virus (H1N1) A/PR/8/34 NEP, NS1 genes for nonstructural protein 2, nonstructural protein 1, complete cds
    50. # >LC662543.1 Influenza A virus (H1N1) A/PR/8/34 M2, M1 genes for matrix protein 2, matrix protein 1, complete cds
    51. # >LC662542.1 Influenza A virus (H1N1) A/PR/8/34 NA gene for neuraminidase, complete cds
    52. # >LC662541.1 Influenza A virus (H1N1) A/PR/8/34 NP gene for nucleoprotein, complete cds
    53. # >LC662540.1 Influenza A virus (H1N1) A/PR/8/34 HA gene for haemagglutinin, complete cds
    54. # >LC662539.1 Influenza A virus (H1N1) A/PR/8/34 PA, PA-X genes for polymerase PA, PA-X protein, complete cds
    55. # >LC662538.1 Influenza A virus (H1N1) A/PR/8/34 PB1, PB1-F2 genes for polymerase PB1, PB1-F2 protein, complete cds
    56. # >LC662537.1 Influenza A virus (H1N1) A/PR/8/34 PB2 gene for polymerase PB2, complete cds
    57. samtools faidx complete_1323429_ncbi.fasta LC662537.1 > H1N1_A-PR-8-34_PB2.fasta
    58. samtools faidx complete_1323429_ncbi.fasta LC662538.1 > H1N1_A-PR-8-34_PB1.fasta
    59. samtools faidx complete_1323429_ncbi.fasta LC662539.1 > H1N1_A-PR-8-34_PA.fasta
    60. samtools faidx complete_1323429_ncbi.fasta LC662540.1 > H1N1_A-PR-8-34_HA.fasta
    61. samtools faidx complete_1323429_ncbi.fasta LC662541.1 > H1N1_A-PR-8-34_NP.fasta
    62. samtools faidx complete_1323429_ncbi.fasta LC662542.1 > H1N1_A-PR-8-34_NA.fasta
    63. samtools faidx complete_1323429_ncbi.fasta LC662543.1 > H1N1_A-PR-8-34_M.fasta
    64. samtools faidx complete_1323429_ncbi.fasta LC662544.1 > H1N1_A-PR-8-34_NS.fasta
    65. # Human cytomegalovirus AD169 10360
    66. # Influenza A virus (H3N2) 41857
    67. # >LC817411.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 8, complete sequence
    68. # >LC817410.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 7, complete sequence
    69. # >LC817409.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 6, complete sequence
    70. # >LC817408.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 5, complete sequence
    71. # >LC817407.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 4, complete sequence
    72. # >LC817406.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 3, complete sequence
    73. # >LC817405.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 2, complete sequence
    74. # >LC817404.1 Influenza A virus H3N2 A_Fukushima_OR808_2023 RNA, seqment 1, complete sequence
    75. samtools faidx complete_41857_ncbi.fasta LC817404.1 > H3N2_A-Fukushima-OR808-2023_PB2.fasta
    76. samtools faidx complete_41857_ncbi.fasta LC817405.1 > H3N2_A-Fukushima-OR808-2023_PB1.fasta
    77. samtools faidx complete_41857_ncbi.fasta LC817406.1 > H3N2_A-Fukushima-OR808-2023_PA.fasta
    78. samtools faidx complete_41857_ncbi.fasta LC817407.1 > H3N2_A-Fukushima-OR808-2023_HA.fasta
    79. samtools faidx complete_41857_ncbi.fasta LC817408.1 > H3N2_A-Fukushima-OR808-2023_NP.fasta
    80. samtools faidx complete_41857_ncbi.fasta LC817409.1 > H3N2_A-Fukushima-OR808-2023_NA.fasta
    81. samtools faidx complete_41857_ncbi.fasta LC817410.1 > H3N2_A-Fukushima-OR808-2023_M.fasta
    82. samtools faidx complete_41857_ncbi.fasta LC817411.1 > H3N2_A-Fukushima-OR808-2023_NS.fasta
    83. # Monkeypox virus 10244: >OP689666.1 Monkeypox virus isolate MPXV/Germany/2022/RKI513, complete genome
    84. samtools faidx complete_10244_ncbi.fasta OP689666.1 > Monkeypox_isolate_MPXV-Germany-2022-RKI513.fasta
    85. # Human immunodeficiency virus 1 11676: >AJ866558.1 Human immunodeficiency virus 1 complete genome, isolate 01IC-PCI127
    86. samtools faidx complete_11676_ncbi.fasta AJ866558.1 > HIV-1_isolate_01IC-PCI127.fasta
    87. # -- Selected genomes saved in the fasta-files --
    88. # Enterovirus_D68_isolate_SH2024-25870.fasta (7391 nt)
    89. # HSV-1_isolate_MacIntyre.fasta (151817 nt)
    90. # HSV-2_strain_G.fasta (155498 nt)
    91. # H1N1_A-PR-8-34_PB2.fasta (2341 nt)
    92. # H1N1_A-PR-8-34_PB1.fasta (2341 nt)
    93. # H1N1_A-PR-8-34_PA.fasta (2233 nt)
    94. # H1N1_A-PR-8-34_HA.fasta (1775 nt)
    95. # H1N1_A-PR-8-34_NP.fasta (1565 nt)
    96. # H1N1_A-PR-8-34_NA.fasta (1413 nt)
    97. # H1N1_A-PR-8-34_M.fasta (1027 nt)
    98. # H1N1_A-PR-8-34_NS.fasta (890 nt)
    99. # Human_cytomegalovirus_strain_AD169.fasta (229354 nt)
    100. # H3N2_A-Fukushima-OR808-2023_PB2.fasta (2301 nt)
    101. # H3N2_A-Fukushima-OR808-2023_PB1.fasta (2316 nt)
    102. # H3N2_A-Fukushima-OR808-2023_PA.fasta (2208 nt)
    103. # H3N2_A-Fukushima-OR808-2023_HA.fasta (1722 nt)
    104. # H3N2_A-Fukushima-OR808-2023_NP.fasta (1536 nt)
    105. # H3N2_A-Fukushima-OR808-2023_NA.fasta (1440 nt)
    106. # H3N2_A-Fukushima-OR808-2023_M.fasta (1002 nt)
    107. # H3N2_A-Fukushima-OR808-2023_NS.fasta (865 nt)
    108. # Monkeypox_isolate_MPXV-Germany-2022-RKI513.fasta (197140 nt)
    109. # HIV-1_isolate_01IC-PCI127.fasta (9752 nt)
  3. (Optional) Run the first round of vrap (--virus==viruses_selected.fasta)

    1. ln -s ~/Tools/vrap/ .
    2. mamba activate /home/jhuang/miniconda3/envs/vrap
    3. cd ~/DATA/Data_Damian/vrap_Ringversuch
    4. cat complete_10244_ncbi.fasta complete_10298_ncbi.fasta complete_10310_ncbi.fasta complete_1323429_ncbi.fasta complete_10360_ncbi.fasta complete_41857_ncbi.fasta complete_10244_ncbi.fasta complete_11676_ncbi.fasta > viruses_selected.fasta
    5. #Run vrap (first round): replace --virus to the specific taxonomy (e.g. viruses_selected.fasta) --> change virus_user_db --> specific_bacteria_user_db
    6. (vrap) for sample in RV1_DNA RV2_DNA RV3_DNA RV4_DNA RV5_DNA RV6_DNA RV1_RNA RV2_RNA RV3_RNA RV4_RNA RV5_RNA RV6_RNA; do
    7. vrap/vrap.py -1 ${sample}_R1.fastq.gz -2 ${sample}_R2.fastq.gz -o vrap_${sample} --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Damian/vrap_Ringversuch/viruses_selected.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g
    8. done
  4. Run the second round of vrap (--host==${virus}.fasta)

    1. cat Enterovirus_D68_isolate_SH2024-25870.fasta HSV-1_isolate_MacIntyre.fasta HSV-2_strain_G.fasta H1N1_A-PR-8-34_PB2.fasta H1N1_A-PR-8-34_PB1.fasta H1N1_A-PR-8-34_PA.fasta H1N1_A-PR-8-34_HA.fasta H1N1_A-PR-8-34_NP.fasta H1N1_A-PR-8-34_NA.fasta H1N1_A-PR-8-34_M.fasta H1N1_A-PR-8-34_NS.fasta Human_cytomegalovirus_strain_AD169.fasta H3N2_A-Fukushima-OR808-2023_PB2.fasta H3N2_A-Fukushima-OR808-2023_PB1.fasta H3N2_A-Fukushima-OR808-2023_PA.fasta H3N2_A-Fukushima-OR808-2023_HA.fasta H3N2_A-Fukushima-OR808-2023_NP.fasta H3N2_A-Fukushima-OR808-2023_NA.fasta H3N2_A-Fukushima-OR808-2023_M.fasta H3N2_A-Fukushima-OR808-2023_NS.fasta Monkeypox_isolate_MPXV-Germany-2022-RKI513.fasta HIV-1_isolate_01IC-PCI127.fasta > viruses_representative.fasta
    2. # Run vrap (second round): selecte some representative viruses from the generated Excel-files generated by the last step as --host
    3. (vrap) for virus in Enterovirus_D68_isolate_SH2024-25870 HSV-1_isolate_MacIntyre HSV-2_strain_G H1N1_A-PR-8-34_PB2 H1N1_A-PR-8-34_PB1 H1N1_A-PR-8-34_PA H1N1_A-PR-8-34_HA H1N1_A-PR-8-34_NP H1N1_A-PR-8-34_NA H1N1_A-PR-8-34_M H1N1_A-PR-8-34_NS Human_cytomegalovirus_strain_AD169 H3N2_A-Fukushima-OR808-2023_PB2 H3N2_A-Fukushima-OR808-2023_PB1 H3N2_A-Fukushima-OR808-2023_PA H3N2_A-Fukushima-OR808-2023_HA H3N2_A-Fukushima-OR808-2023_NP H3N2_A-Fukushima-OR808-2023_NA H3N2_A-Fukushima-OR808-2023_M H3N2_A-Fukushima-OR808-2023_NS Monkeypox_isolate_MPXV-Germany-2022-RKI513 HIV-1_isolate_01IC-PCI127; do
    4. for sample in RV1_DNA RV2_DNA RV3_DNA RV4_DNA RV5_DNA RV6_DNA RV1_RNA RV2_RNA RV3_RNA RV4_RNA RV5_RNA RV6_RNA; do
    5. vrap/vrap_until_bowtie2.py -1 ${sample}_R1.fastq.gz -2 ${sample}_R2.fastq.gz -o vrap_${sample}_on_${virus} --host /home/jhuang/DATA/Data_Damian/vrap_Ringversuch/${virus}.fasta -t 100 -l 200 --gbt2 --noblast
    6. done
    7. done
  5. Generate the mapping statistics for the sam-files generated from last step

    1. #Enterovirus_D68_isolate_SH2024-25870
    2. #for virus in HSV-1_isolate_MacIntyre HSV-2_strain_G H1N1_A-PR-8-34_PB2 H1N1_A-PR-8-34_PB1 H1N1_A-PR-8-34_PA H1N1_A-PR-8-34_HA H1N1_A-PR-8-34_NP H1N1_A-PR-8-34_NA H1N1_A-PR-8-34_M H1N1_A-PR-8-34_NS Human_cytomegalovirus_strain_AD169; do
    3. for virus in H3N2_A-Fukushima-OR808-2023_PB2 H3N2_A-Fukushima-OR808-2023_PB1 H3N2_A-Fukushima-OR808-2023_PA H3N2_A-Fukushima-OR808-2023_HA H3N2_A-Fukushima-OR808-2023_NP H3N2_A-Fukushima-OR808-2023_NA H3N2_A-Fukushima-OR808-2023_M H3N2_A-Fukushima-OR808-2023_NS Monkeypox_isolate_MPXV-Germany-2022-RKI513 HIV-1_isolate_01IC-PCI127; do
    4. for sample in RV1_DNA RV2_DNA RV3_DNA RV4_DNA RV5_DNA RV6_DNA RV1_RNA RV2_RNA RV3_RNA RV4_RNA RV5_RNA RV6_RNA; do
    5. echo "-----${sample}_on_${virus}------" >> LOG_mapping
    6. cd vrap_${sample}_on_${virus}/bowtie
    7. # Rename and convert SAM to BAM
    8. mv mapped mapped.sam 2>> ../../LOG_mapping
    9. samtools view -S -b mapped.sam > mapped.bam 2>> ../../LOG_mapping
    10. samtools sort mapped.bam -o mapped_sorted.bam 2>> ../../LOG_mapping
    11. samtools index mapped_sorted.bam 2>> ../../LOG_mapping
    12. # Write flagstat output to log (go up two levels to write correctly)
    13. samtools flagstat mapped_sorted.bam >> ../../LOG_mapping 2>&1
    14. cd ../..
    15. done
    16. done
    17. #draw some plots for some representative isolates which found in the first round (see Excel-file).
    18. samtools depth -m 0 -a mapped_sorted.bam > coverage.txt
    19. grep "PQ895337.1" coverage.txt > PQ895337_coverage.txt
    20. import pandas as pd
    21. import matplotlib.pyplot as plt
    22. import sys
    23. import os
    24. import re
    25. # Check for required arguments
    26. if len(sys.argv) != 3:
    27. print("Usage: python script.py <coverage_file> <genome_length>")
    28. sys.exit(1)
    29. # Parse arguments
    30. coverage_file = sys.argv[1]
    31. genome_length = int(sys.argv[2])
    32. # Extract accession from file name (e.g., "PQ895337" from "PQ895337_coverage.txt")
    33. file_name = os.path.basename(coverage_file)
    34. accession_match = re.match(r"([A-Z0-9]+)_coverage\.txt", file_name)
    35. accession = accession_match.group(1) if accession_match else ""
    36. # Extract sample name from the grandparent directory of the file
    37. sample_dir = os.path.basename(os.path.dirname(os.path.dirname(coverage_file)))
    38. sample_name = re.sub(r'^vrap_', '', sample_dir).replace('_', ' ')
    39. # Create title and filename
    40. plot_title = f"{sample_name} ({accession})"
    41. output_filename = plot_title.replace(" ", "_") + ".png"
    42. # Load coverage data
    43. df = pd.read_csv(
    44. coverage_file,
    45. sep="\t", header=None, names=["chr", "pos", "coverage"]
    46. )
    47. # Create a full genome position index
    48. full_index = pd.DataFrame({'pos': range(1, genome_length + 1)})
    49. # Merge coverage data with full index
    50. df_full = pd.merge(full_index, df[['pos', 'coverage']], on='pos', how='left')
    51. df_full['coverage'].fillna(0, inplace=True)
    52. # Plot
    53. plt.figure(figsize=(10, 4))
    54. plt.plot(df_full["pos"], df_full["coverage"], color="blue", linewidth=0.5)
    55. plt.xlabel("Genomic Position")
    56. plt.ylabel("Coverage Depth")
    57. plt.title(plot_title)
    58. plt.tight_layout()
    59. # Save plot to file
    60. plt.savefig(output_filename, dpi=150)
    61. print(f"Plot saved to {output_filename}")
    62. # Optionally show the plot
    63. # plt.show()
  6. Report

Subject: Mapping Results and Selected Reference Genomes

Dear XXXX,

Please find below the results of the mapping analysis. For each virus you provided, I have selected a representative reference isolate, listed as follows:

Selected Reference Isolates

  1. Enterovirus D68
  2. PQ895337.1 Enterovirus D68 isolate SH2024-25870
  3. HSV-1 (Herpes Simplex Virus 1)
  4. PQ569920.1 Human alphaherpesvirus 1 isolate MacIntyre, complete genome
  5. HSV-2 (Herpes Simplex Virus 2)
  6. OM370995.1 Human alphaherpesvirus 2 strain G, complete genome
  7. Influenza A Virus (H1N1)
  8. LC662537.1 PB2 gene, complete CDS
  9. LC662538.1 PB1 and PB1-F2 genes, complete CDS
  10. LC662539.1 PA and PA-X genes, complete CDS
  11. LC662540.1 HA gene, complete CDS
  12. LC662541.1 NP gene, complete CDS
  13. LC662542.1 NA gene, complete CDS
  14. LC662543.1 M1 and M2 genes, complete CDS
  15. LC662544.1 NS1 and NEP genes, complete CDS
  16. Cytomegalovirus (strain AD169)
  17. X17403.1 Human cytomegalovirus strain AD169, complete genome
  18. Influenza A Virus (H3N2)
  19. LC817404.1 PB2 gene
  20. LC817405.1 PB1 genes
  21. LC817406.1 PA genes
  22. LC817407.1 HA gene
  23. LC817408.1 NP gene
  24. LC817409.1 NA gene
  25. LC817410.1 M genes
  26. LC817411.1 NS genes
  27. Monkeypox Virus
  28. OP689666.1 Isolate MPXV/Germany/2022/RKI513, complete genome
  29. Human Immunodeficiency Virus 1 (HIV-1)
  30. AJ866558.1 Isolate 01IC-PCI127, complete genome

Mapping Results

  1. We mapped paired-end reads from 12 Ringversuch project samples against the selected reference genomes.
  2. Below are the mapping statistics for Enterovirus D68, HSV-1, HSV-2, and H1N1. Coverage plots are attached for all cases where the percentage of reads mapping to the reference genome is greater than 0.00%. Results for the remaining viruses will follow next week.
  3. (* An asterisk indicates cases with non-zero mapping percentages.)

Mapping Statistics

  1. Enterovirus D68 (SH2024-25870):
  2. RV1_DNA: 0 (0.00%)
  3. RV2_DNA: 0 (0.00%)
  4. RV3_DNA: 0 (0.00%)
  5. RV4_DNA: 0 (0.00%)
  6. RV5_DNA: 0 (0.00%)
  7. RV6_DNA: 0 (0.00%)
  8. RV1_RNA: 66 (0.00%)
  9. RV2_RNA: 55 (0.00%)
  10. RV3_RNA: 15 (0.00%)
  11. RV4_RNA: 1701 (0.02%) *
  12. RV5_RNA: 26 (0.00%)
  13. RV6_RNA: 35 (0.00%)
  14. HSV-1 (isolate MacIntyre):
  15. RV1_DNA: 387 (0.02%) *
  16. RV2_DNA: 6232 (0.26%) *
  17. RV3_DNA: 0 (0.00%)
  18. RV4_DNA: 1443 (0.03%) *
  19. RV5_DNA: 2 (0.00%)
  20. RV6_DNA: 0 (0.00%)
  21. RV1_RNA: 6 (0.00%)
  22. RV2_RNA: 32 (0.00%)
  23. RV3_RNA: 4 (0.00%)
  24. RV4_RNA: 13 (0.00%)
  25. RV5_RNA: 4 (0.00%)
  26. RV6_RNA: 10 (0.00%)
  27. HSV-2 (strain G):
  28. RV1_DNA: 201 (0.01%) *
  29. RV2_DNA: 376 (0.02%) *
  30. RV3_DNA: 0 (0.00%)
  31. RV4_DNA: 19670 (0.46%) *
  32. RV5_DNA: 0 (0.00%)
  33. RV6_DNA: 0 (0.00%)
  34. RV1_RNA: 0 (0.00%)
  35. RV2_RNA: 3 (0.00%)
  36. RV3_RNA: 0 (0.00%)
  37. RV4_RNA: 25 (0.00%)
  38. RV5_RNA: 5 (0.00%)
  39. RV6_RNA: 24 (0.00%)
  40. Influenza A Virus (H1N1, A/PR/8/34):
  41. RV1_DNA: 0 (0.00%)
  42. RV2_DNA: 0 (0.00%)
  43. RV3_DNA: 0 (0.00%)
  44. RV4_DNA: 0 (0.00%)
  45. RV5_DNA: 0 (0.00%)
  46. RV6_DNA: 0 (0.00%)
  47. RV1_RNA: 0 (0.00%)
  48. RV2_RNA: 0 (0.00%)
  49. RV3_RNA: 0 (0.00%)
  50. RV4_RNA: 13 + 354 (0.00%)
  51. RV5_RNA: 0 (0.00%)
  52. RV6_RNA: 0 (0.00%)

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum