Run viral_ngs (Data_Pietschmann_229ECoronavirus_Mutations_2026)

http://xgenes.com/article/article-content/388/variant-calling-for-data-huang-human-herpesvirus-3-using-snippy-spandx-viralngs/

Calling inter-host variants by merging the results from snippy+spandx (Manually!) Calling intra-host variants using viral-ngs (http://xgenes.com/article/article-content/347/variant-calling-for-herpes-simplex-virus-1-from-patient-sample-using-capture-probe-sequencing/) #TODO: How? Merge intra- and inter-host variants, comparing the variants to the alignments of the assemblies to confirm its correctness.

#TODO: If the results from 2024 contains only intra-host variants, explain this time I also give the results of the inter-host variants!

Variant calling (inter-host + intra-host) for Data_Pietschmann_229ECoronavirus_Mutations_2024+2025+2026 (via docker own_viral_ngs) v2

  1. Input data:

     # ---- Datasets 2024 (in total 4) ----
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/hCoV229E_Rluc_R1.fastq.gz hCoV229E_Rluc_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/hCoV229E_Rluc_R2.fastq.gz hCoV229E_Rluc_R2.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/p10_DMSO_R1.fastq.gz DMSO_p10_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/p10_DMSO_R2.fastq.gz DMSO_p10_R2.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/p10_K22_R1.fastq.gz K22_p10_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/p10_K22_R2.fastq.gz K22_p10_R2.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/p10_K7523_R1.fastq.gz X7523_p10_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2024/p10_K7523_R2.fastq.gz X7523_p10_R2.fastq.gz
    
     # ---- Datasets 2025 (in total 3) ----
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2025/250506_VH00358_136_AAG3YJ5M5/p20606/p16_DMSO_S29_R1_001.fastq.gz DMSO_p16_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2025/250506_VH00358_136_AAG3YJ5M5/p20606/p16_DMSO_S29_R2_001.fastq.gz DMSO_p16_R2.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2025/250506_VH00358_136_AAG3YJ5M5/p20607/p16_K22_S30_R1_001.fastq.gz K22_p16_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2025/250506_VH00358_136_AAG3YJ5M5/p20607/p16_K22_S30_R2_001.fastq.gz K22_p16_R2.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2025/250506_VH00358_136_AAG3YJ5M5/p20608/p16_X7523_S31_R1_001.fastq.gz X7523_p16_R1.fastq.gz
     ln -s ../../Data_Pietschmann_229ECoronavirus_Mutations_2025/raw_data_2025/250506_VH00358_136_AAG3YJ5M5/p20608/p16_X7523_S31_R2_001.fastq.gz X7523_p16_R2.fastq.gz
    
     # ---- Datasets 2026 (in total 3) ----
     ln -s ../raw_data_2026/20260212_AV243904_0054_B/02_DMSO_p26/02_DMSO_p26_R1.fastq.gz DMSO_p26_R1.fastq.gz
     ln -s ../raw_data_2026/20260212_AV243904_0054_B/02_DMSO_p26/02_DMSO_p26_R2.fastq.gz DMSO_p26_R2.fastq.gz
     ln -s ../raw_data_2026/20260212_AV243904_0054_B/01_K22_p26/01_K22_p26_R1.fastq.gz K22_p26_R1.fastq.gz
     ln -s ../raw_data_2026/20260212_AV243904_0054_B/01_K22_p26/01_K22_p26_R2.fastq.gz K22_p26_R2.fastq.gz
     ln -s ../raw_data_2026/20260212_AV243904_0054_B/03_X723_p26/03_X723_p26_R1.fastq.gz X7523_p26_R1.fastq.gz
     ln -s ../raw_data_2026/20260212_AV243904_0054_B/03_X723_p26/03_X723_p26_R2.fastq.gz X7523_p26_R2.fastq.gz
  2. Call variant calling using snippy

     ln -s ~/Tools/bacto/db/ .;
     ln -s ~/Tools/bacto/envs/ .;
     ln -s ~/Tools/bacto/local/ .;
     cp ~/Tools/bacto/Snakefile .;
     cp ~/Tools/bacto/bacto-0.1.json .;
     cp ~/Tools/bacto/cluster.json .;
    
     #download CU459141.gb from GenBank
     mv ~/Downloads/sequence\(2\).gb db/PP810610.gb
    
     #setting the following in bacto-0.1.json
         "fastqc": false,
         "taxonomic_classifier": false,
         "assembly": true,
         "typing_ariba": false,
         "typing_mlst": true,
         "pangenome": true,
         "variants_calling": true,
         "phylogeny_fasttree": true,
         "phylogeny_raxml": true,
         "recombination": false, (due to gubbins-error set false)
         "genus": "Alphacoronavirus",
         "kingdom": "Viruses",
         "species": "Human coronavirus 229E",
         "mykrobe": {
             "species": "corona"
         },
         "reference": "db/PP810610.gb"
    
     mamba activate /home/jhuang/miniconda3/envs/bengal3_ac3
     (bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
  3. Summarize all SNPs and Indels from the snippy result directory.

     cp ~/Scripts/summarize_snippy_res_ordered.py .
     # IMPORTANT_ADAPT the array isolates = ["hCoV229E_Rluc", "DMSO_p10", "K22_p10", "X7523_p10", "DMSO_p16", "K22_p16", "X7523_p16", "DMSO_p26", "K22_p26", "X7523_p26"]
     mamba activate plot-numpy1
     python3 ./summarize_snippy_res_ordered.py snippy
     #--> Summary CSV file created successfully at: snippy/summary_snps_indels.csv
     cd snippy
     #REMOVE_the_line? I don't find the sence of the line:    grep -v "None,,,,,,None,None" summary_snps_indels.csv > summary_snps_indels_.csv
  4. Using spandx calling variants (almost the same results to the one from viral-ngs!)

     mamba deactivate
     mamba activate /home/jhuang/miniconda3/envs/spandx
     mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/PP810610
     cp PP810610.gb  ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/PP810610/genes.gbk
     vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
     /home/jhuang/miniconda3/envs/spandx/bin/snpEff build PP810610    #-d
     ~/Scripts/genbank2fasta.py PP810610.gb
     mv PP810610.gb_converted.fna PP810610.fasta    #rename "NC_001348.1 xxxxx" to "NC_001348" in the fasta-file
     ln -s /home/jhuang/Tools/spandx/ spandx
     (spandx) nextflow run spandx/main.nf --fastq "trimmed/*_P_{1,2}.fastq" --ref PP810610.fasta --annotation --database PP810610 -resume
    
     # Rerun SNP_matrix.sh due to the error ERROR_CHROMOSOME_NOT_FOUND in the variants annotation
     cd Outputs/Master_vcf
     (spandx) cp -r ../../snippy/hCoV229E_Rluc/reference .
     (spandx) cp ../../spandx/bin/SNP_matrix.sh ./
     #Note that ${variant_genome_path}=NC_001348 in the following command, but it was not used after command replacement.
     #Adapt "snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -v ${variant_genome_path} out.vcf > out.annotated.vcf" to
     "/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -c reference/snpeff.config -dataDir . ref out.vcf > out.annotated.vcf" in SNP_matrix.sh
     (spandx) bash SNP_matrix.sh PP810610 .
  5. Calling inter-host variants by merging the results from snippy+spandx (Manually!)

     # Inter-host variants(宿主间变异):一种病毒在两个人之间有不同的基因变异,这些变异可能与宿主的免疫反应、疾病表现或病毒传播的方式相关。
     cp All_SNPs_indels_annotated.txt All_SNPs_indels_annotated_backup.txt
     vim All_SNPs_indels_annotated.txt
    
     #in the file ids: grep "$(echo -e '\t')353$(echo -e '\t')" All_SNPs_indels_annotated.txt >> All_SNPs_indels_annotated_.txt
     #Replace \n with " All_SNPs_indels_annotated.txt >> All_SNPs_indels_annotated_.txt\ngrep "
     #Replace grep " --> grep "$(echo -e '\t')
     #Replace " All_ --> $(echo -e '\t')" All_
    
     # Potential intra-host variants: 10871, 19289, 23435.
     CHROM   POS     REF     ALT     TYPE    hCoV229E_Rluc_trimmed   p10_DMSO_trimmed        p10_K22_trimmed p10_K7523_trimmed       p16_DMSO_trimmed        p16_K22_trimmed p16_X7523_trimmed       Effect  Impact  Functional_Class        Codon_change    Protein_and_nucleotide_change   Amino_Acid_Length       Gene_name       Biotype
     PP810610        1464    T       C       SNP     C       C       C       C       C       C       C       missense_variant        MODERATE        MISSENSE        gTt/gCt p.Val416Ala/c.1247T>C   6757    CDS_1   protein_coding
     PP810610        1699    C       T       SNP     T       T       T       T       T       T       T       synonymous_variant      LOW     SILENT  gtC/gtT p.Val494Val/c.1482C>T   6757    CDS_1   protein_coding
     PP810610        6691    C       T       SNP     T       T       T       T       T       T       T       synonymous_variant      LOW     SILENT  tgC/tgT p.Cys2158Cys/c.6474C>T  6757    CDS_1   protein_coding
     PP810610        6919    C       G       SNP     G       G       G       G       G       G       G       synonymous_variant      LOW     SILENT  ggC/ggG p.Gly2234Gly/c.6702C>G  6757    CDS_1   protein_coding
     PP810610        7294    T       A       SNP     A       A       A       A       A       A       A       missense_variant        MODERATE        MISSENSE        agT/agA p.Ser2359Arg/c.7077T>A  6757    CDS_1   protein_coding
     * PP810610       10871   C       T       SNP     C       C/T     T       C/T     C/T     T       C/T     missense_variant        MODERATE        MISSENSE        Ctt/Ttt p.Leu3552Phe/c.10654C>T 6757    CDS_1   protein_coding
     PP810610        14472   T       C       SNP     C       C       C       C       C       C       C       missense_variant        MODERATE        MISSENSE        aTg/aCg p.Met4752Thr/c.14255T>C 6757    CDS_1   protein_coding
     PP810610        15458   T       C       SNP     C       C       C       C       C       C       C       synonymous_variant      LOW     SILENT  Ttg/Ctg p.Leu5081Leu/c.15241T>C 6757    CDS_1   protein_coding
     PP810610        16035   C       A       SNP     A       A       A       A       A       A       A       stop_gained     HIGH    NONSENSE        tCa/tAa p.Ser5273*/c.15818C>A   6757    CDS_1   protein_coding
     PP810610        17430   T       C       SNP     C       C       C       C       C       C       C       missense_variant        MODERATE        MISSENSE        tTa/tCa p.Leu5738Ser/c.17213T>C 6757    CDS_1   protein_coding
     * PP810610       19289   G       T       SNP     G       G       T       G       G       G/T     G       missense_variant        MODERATE        MISSENSE        Gtt/Ttt p.Val6358Phe/c.19072G>T 6757    CDS_1   protein_coding
     PP810610        21183   T       G       SNP     G       G       G       G       G       G       G       missense_variant        MODERATE        MISSENSE        tTt/tGt p.Phe230Cys/c.689T>G    1173    CDS_2   protein_coding
     PP810610        22636   T       G       SNP     G       G       G       G       G       G       G       missense_variant        MODERATE        MISSENSE        aaT/aaG p.Asn714Lys/c.2142T>G   1173    CDS_2   protein_coding
     PP810610        23022   T       C       SNP     C       C       C       C       C       C       C       missense_variant        MODERATE        MISSENSE        tTa/tCa p.Leu843Ser/c.2528T>C   1173    CDS_2   protein_coding
     * PP810610       23435   C       T       SNP     C       C       T       C/T     C       C/T     C/T     missense_variant        MODERATE        MISSENSE        Ctt/Ttt p.Leu981Phe/c.2941C>T   1173    CDS_2   protein_coding
     PP810610        24512   C       T       SNP     T       T       T       T       T       T       T       missense_variant        MODERATE        MISSENSE        Ctc/Ttc p.Leu36Phe/c.106C>T     88      CDS_4   protein_coding
     PP810610        24781   C       T       SNP     T       T       T       T       T       T       T       missense_variant        MODERATE        MISSENSE        aCt/aTt p.Thr36Ile/c.107C>T     77      CDS_5   protein_coding
     PP810610        25163   C       T       SNP     T       T       T       T       T       T       T       missense_variant        MODERATE        MISSENSE        Ctt/Ttt p.Leu82Phe/c.244C>T     225     CDS_6   protein_coding
     PP810610        25264   C       T       SNP     T       T       T       T       T       T       T       synonymous_variant      LOW     SILENT  gtC/gtT p.Val115Val/c.345C>T    225     CDS_6   protein_coding
     PP810610        26838   G       T       SNP     T       T       T       T       T       T       T
  6. Calling intra-host variants using viral-ngs

     # Intra-host variants(宿主内变异):同一个人感染了某种病毒,但在其体内的不同细胞或器官中可能存在多个不同的病毒变异株。
    
     #How to run and debug the viral-ngs docker?
     # ---- DEBUG_2026_1: using docker instead ----
     mkdir viralngs; cd viralngs
     ln -s ~/Tools/viral-ngs_docker/Snakefile Snakefile
     ln -s  ~/Tools/viral-ngs_docker/bin bin
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/refsel.acids refsel.acids
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/lastal.acids lastal.acids
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/config.yaml config.yaml
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-runs.txt samples-runs.txt
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-depletion.txt samples-depletion.txt
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-metagenomics.txt samples-metagenomics.txt
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-assembly.txt samples-assembly.txt
     cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-assembly-failures.txt samples-assembly-failures.txt
     # Adapt the sample-*.txt
    
     mkdir viralngs/data
     mkdir viralngs/data/00_raw
    
     mkdir bams
     ref_fa="PP810610.fasta";
     #for sample in hCoV229E_Rluc p10_DMSO p10_K22; do
     #for sample in p10_K7523 p16_DMSO p16_K22 p16_X7523; do
     for sample in hCoV229E_Rluc DMSO_p10 K22_p10 X7523_p10 DMSO_p16 K22_p16 X7523_p16 DMSO_p26 K22_p26 X7523_p26; do
         bwa index ${ref_fa}; \
         bwa mem -M -t 16 ${ref_fa} trimmed/${sample}_trimmed_P_1.fastq trimmed/${sample}_trimmed_P_2.fastq | samtools view -bS - > bams/${sample}_genome_alignment.bam; \
     done
    
     conda activate viral-ngs4
     #for sample in hCoV229E_Rluc p10_DMSO p10_K22; do
     #for sample in p10_K7523 p16_DMSO p16_K22 p16_X7523; do
     for sample in hCoV229E_Rluc DMSO_p10 K22_p10 X7523_p10 DMSO_p16 K22_p16 X7523_p16 DMSO_p26 K22_p26 X7523_p26; do
         picard AddOrReplaceReadGroups I=bams/${sample}_genome_alignment.bam O=~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2026/viralngs/data/00_raw/${sample}.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=$sample RGSM=$sample RGLB=standard RGPU=$sample VALIDATION_STRINGENCY=LENIENT; \
     done
     conda deactivate
    
     # -- ! Firstly set the samples-assembly.txt empty, so that only focus on running depletion!
     docker run -it -v /mnt/md1/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2026/viralngs:/work -v /home/jhuang/Tools/viral-ngs_docker:/home/jhuang/Tools/viral-ngs_docker -v /home/jhuang/REFs:/home/jhuang/REFs -v /home/jhuang/Tools/GenomeAnalysisTK-3.6:/home/jhuang/Tools/GenomeAnalysisTK-3.6 -v /home/jhuang/Tools/novocraft_v3:/home/jhuang/Tools/novocraft_v3 -v /usr/local/bin/gatk:/usr/local/bin/gatk   own_viral_ngs bash
     cd /work
     snakemake --directory /work --printshellcmds --cores 80
    
     # -- ! Secondly manully run assembly steps
     # --> By itereative add the unfinished assembly in the list, each time replace one, and run "cd /work; snakemake --directory /work --printshellcmds --cores 80" after exiting and re-entering the docker-env, since some tools were during the running automatically deleted.

Here is the combined, consolidated fix sequence for your Docker container:


🔧 Complete Fix Commands (Run Inside Docker Container)

#!/bin/bash
# ============================================================
# FIX SCRIPT FOR viral-ngs Docker Environment
# Run these commands INSIDE your running Docker container
# ============================================================

echo "=== Step 1: Activate base environment for config changes ==="
conda activate base

echo "=== Step 2: Disable conda safety checks (fixes 'unsafe path' errors) ==="
conda config --set safety_checks disabled
conda config --set allow_softlinks true

echo "=== Step 3: Verify config was applied ==="
conda config --show safety_checks
# Expected output: safety_checks: disabled

echo "=== Step 4: (Optional) Update conda for compatibility ==="
conda update -n base -c defaults conda -y

echo "=== Step 5: Activate viral-ngs environment ==="
conda activate viral-ngs-env
echo "Current env: $CONDA_DEFAULT_ENV" && which python
#Current env: viral-ngs-env
#/opt/miniconda/bin/python --> ⚠️ Problem Identified!

#conda deactivate; conda deactivate;
## 1. Install mamba in your base environment (one-time setup)
#conda install -n base -c conda-forge mamba -y
## 2. Activate your existing environment
#conda activate viral-ngs-env
## 3. Use mamba to install missing packages (instead of conda)
#mamba install -c bioconda biopython mafft -y
## 4. Verify existing tools are still there
#which python
#conda list  # Shows all packages, both old and new

conda install python=3.6.7 -y
which python

echo "Current env: $CONDA_DEFAULT_ENV" && which python
#Current env: viral-ngs-env
#/opt/miniconda/envs/viral-ngs-env/bin/python (Python 3.6.7)

echo "=== Step 6: Install missing Python packages ==="
conda install -y -c conda-forge biopython

echo "=== Step 7: Install missing binary tools (with specific versions if needed) ==="
conda install -y -c bioconda perl=5.32.1 prinseq-lite samtools

echo "=== Step 8: Verify all installations ==="
echo "--- Checking samtools ---"
which samtools && samtools --version
#/opt/miniconda/envs/viral-ngs-env/bin/samtools samtools 1.9 using htslib 1.9

echo "--- Checking perl ---"
which perl && perl --version

echo "--- Checking prinseq-lite ---"
which prinseq-lite.pl && prinseq-lite.pl -version

echo "--- Checking Biopython ---"
python -c "import Bio; print('Biopython OK:', Bio.__version__)"

echo "=== Step 9: Refresh environment PATH ==="
hash -r

echo "=== ✅ All fixes applied! You can now re-run your pipeline ==="
echo "Tip: Run 'snakemake --unlock' first if pipeline is locked, then:"
echo "     snakemake -j 
<threads> --rerun-incomplete"

In Dockerfile

#ENV CONDA_ALLOW_UNSAFE_PATHS=1 #RUN conda update -n base -c defaults conda -y


🐳 To Make Fixes Permanent: Commit the Container

After running the fixes above, save your working container:

# 1. Exit the container (but don't delete it)
exit
docker ps -a
docker commit c51d44624f1b viral-ngs-fixed:2026-03-19

# 3. Next time, run the fixed image
#docker run -it -v /mnt/md1/... [your other volumes] viral-ngs-fixed:2026-03-19 bash
docker run -it -v /mnt/md1/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2026/viralngs:/work -v /home/jhuang/Tools/viral-ngs_docker:/home/jhuang/Tools/viral-ngs_docker -v /home/jhuang/REFs:/home/jhuang/REFs -v /home/jhuang/Tools/GenomeAnalysisTK-3.6:/home/jhuang/Tools/GenomeAnalysisTK-3.6 -v /home/jhuang/Tools/novocraft_v3:/home/jhuang/Tools/novocraft_v3 -v /usr/local/bin/gatk:/usr/local/bin/gatk   viral-ngs-fixed:2026-03-19 bash

conda activate viral-ngs-env
which samtools && samtools --version
#/opt/miniconda/envs/viral-ngs-env/bin/samtools samtools 1.9 using htslib 1.9
which perl && perl --version
#/opt/miniconda/envs/viral-ngs-env/bin/perl (v5.26.2)
which prinseq-lite.pl && prinseq-lite.pl -version
#/opt/miniconda/envs/viral-ngs-env/bin/prinseq-lite.pl (PRINSEQ-lite 0.20.4)
python -c "import Bio; print('Biopython OK:', Bio.__version__)"
#Biopython OK: 1.72

conda install -c bioconda trimmomatic -y
which trimmomatic
#/opt/miniconda/envs/viral-ngs-env/bin/trimmomatic (trimmomatic-0.39)

exit
docker ps -a
docker commit e70395e5625c viral-ngs-fixed:l

docker run -it -v /mnt/md1/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2026/viralngs:/work -v /home/jhuang/Tools/viral-ngs_docker:/home/jhuang/Tools/viral-ngs_docker -v /home/jhuang/REFs:/home/jhuang/REFs -v /home/jhuang/Tools/GenomeAnalysisTK-3.6:/home/jhuang/Tools/GenomeAnalysisTK-3.6 -v /home/jhuang/Tools/novocraft_v3:/home/jhuang/Tools/novocraft_v3 -v /usr/local/bin/gatk:/usr/local/bin/gatk   viral-ngs-fixed:l bash
cd /work
#!!!! IMPORTANT !!!!
conda activate viral-ngs-env
snakemake --directory /work --printshellcmds --cores 80

#DEBUG the specific commmand as follows, for example install Gap2Seq in the docker-env. bin/assembly.py gapfill_gap2seq tmp/02_assembly/hCoV229E_Rluc.assembly2-scaffolded.fasta data/01_per_sample/hCoV229E_Rluc.cleaned.bam tmp/02_assembly/hCoV229E_Rluc.assembly2-gapfilled.fasta –memLimitGb 12 –maskErrors –randomSeed 0 —-> go to the script tools/gap2seq.py, install the required tool (for example gap2seq) and adapt the correct version. conda install -y -c bioconda gap2seq root@f47daf7c44ee:/work# Gap2Seq -h #>Gap2Seq 3.1 vim /home/jhuang/Tools/viral-ngs_docker/bin/tools/gap2seq.py #Adapt the TOOL_NAME and TOOL_VERSION in the script

#Save the docker-env with newly installed Gap2Seq exit docker ps -a docker commit f47daf7c44ee viral-ngs-fixed:la

#调用新的 docker-env installed with Gap2Seq docker run -it -v /mnt/md1/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2026/viralngs:/work -v /home/jhuang/Tools/viral-ngs_docker:/home/jhuang/Tools/viral-ngs_docker -v /home/jhuang/REFs:/home/jhuang/REFs -v /home/jhuang/Tools/GenomeAnalysisTK-3.6:/home/jhuang/Tools/GenomeAnalysisTK-3.6 -v /home/jhuang/Tools/novocraft_v3:/home/jhuang/Tools/novocraft_v3 -v /usr/local/bin/gatk:/usr/local/bin/gatk viral-ngs-fixed:la bash cd /work conda activate viral-ngs-env snakemake –directory /work –printshellcmds –cores 80

#MANUALLY running the following commands! bin/assembly.py gapfill_gap2seq in_scaffold=tmp/02_assembly/hCoV229E_Rluc.assembly2-scaffolded.fasta in_bam=data/01_per_sample/hCoV229E_Rluc.cleaned.bam out_scaffold=tmp/02_assembly/hCoV229E_Rluc.assembly2-gapfilled.fasta mem_limit_gb=12 time_soft_limit_minutes=60.0 mask_errors=True gap2seq_opts= random_seed=0 threads=None loglevel=INFO tmp_dir=/tmp tmp_dirKeep=False

#MANUALLY running the following commands! bin/assembly.py impute_from_reference tmp/02_assembly/hCoV229E_Rluc.assembly2-gapfilled.fasta tmp/02_assembly/hCoV229E_Rluc.assembly2-scaffold_ref.fasta tmp/02_assembly/hCoV229E_Rluc.assembly3-modify.fasta –newName hCoV229E_Rluc –replaceLength 55 –minLengthFraction 0.05 –minUnambig 0.05 –index

#!!!! TODO_NEXT_WEEK !!!!: run several times of docker so that all ${sample}.assembly2-scaffolded.fasta generated! Then run the following commands, after that run docker for all isolates. for sample in DMSO_p10 K22_p10 X7523_p10 DMSO_p16 K22_p16 X7523_p16 DMSO_p26 K22_p26 X7523_p26; do #MANUALLY running the following commands! bin/assembly.py gapfill_gap2seq in_scaffold=tmp/02_assembly/${sample}.assembly2-scaffolded.fasta in_bam=data/01_per_sample/${sample}.cleaned.bam out_scaffold=tmp/02_assembly/${sample}.assembly2-gapfilled.fasta mem_limit_gb=12 time_soft_limit_minutes=60.0 mask_errors=True gap2seq_opts= random_seed=0 threads=None loglevel=INFO tmp_dir=/tmp tmp_dirKeep=False #MANUALLY running the following commands! bin/assembly.py impute_from_reference tmp/02_assembly/${sample}.assembly2-gapfilled.fasta tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta tmp/02_assembly/${sample}.assembly3-modify.fasta –newName ${sample} –replaceLength 55 –minLengthFraction 0.05 –minUnambig 0.05 –index done


🔍 Troubleshooting Checklist

If issues persist after running the fix:

# Check conda config
conda config --show | grep -E "safety|softlink"

# List installed packages in viral-ngs-env
conda activate viral-ngs-env
conda list | grep -E "samtools|prinseq|perl|biopython"

# Test each tool manually
samtools --version
prinseq-lite.pl -version
perl -e 'use Bio::Seq; print "BioPerl OK\n"'

# Check PATH includes conda bin directories
echo $PATH | tr ':' '\n' | grep conda

⚠️ Important Notes

Issue Solution
safety_checks disabled not working Must run conda config in base env, not viral-ngs-env
Packages still fail to install Try conda clean --all -y first, then reinstall
samtools: command not found after install Run hash -r or restart shell to refresh PATH
Pipeline still fails after fixes Run snakemake --unlock --rerun-incomplete to resume
    conda config --set safety_checks disable
    conda activate viral-ngs-env
    conda install -y -c conda-forge biopython

    docker ps -a
    # Look for the container you are working in, e.g., "viral-ngs-container"
    #bb117a6ca70a

    docker commit 
viral-ngs-fixed:latest docker run -it viral-ngs-fixed:latest bash # # —- NOTE that the following steps need rerun –> DOES NOT WORK, USE STRATEGY ABOVE —- # #for sample in p10_K22 p10_K7523; do # for sample in hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523 p16_DMSO p16_K22 p16_X7523; do # bin/read_utils.py merge_bams data/01_cleaned/${sample}.cleaned.bam tmp/01_cleaned/${sample}.cleaned.bam –picardOptions SORT_ORDER=queryname # bin/read_utils.py rmdup_mvicuna_bam tmp/01_cleaned/${sample}.cleaned.bam data/01_per_sample/${sample}.cleaned.bam –JVMmemory 30g # done # # #Note that the error generated by nextflow is from the step gapfill_gap2seq! # for sample in hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523 p16_DMSO p16_K22 p16_X7523; do # bin/assembly.py assemble_spades data/01_per_sample/${sample}.taxfilt.bam /home/jhuang/REFs/viral_ngs_dbs/trim_clip/contaminants.fasta tmp/02_assembly/${sample}.assembly1-spades.fasta –nReads 10000000 –threads 15 –memLimitGb 12 # done # for sample in hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523 p16_DMSO p16_K22 p16_X7523; do # for sample in p10_K22 p10_K7523; do # bin/assembly.py order_and_orient tmp/02_assembly/${sample}.assembly1-spades.fasta refsel_db/refsel.fasta tmp/02_assembly/${sample}.assembly2-scaffolded.fasta –min_pct_contig_aligned 0.05 –outAlternateContigs tmp/02_assembly/${sample}.assembly2-alternate_sequences.fasta –nGenomeSegments 1 –outReference tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta –threads 15 # done # # for sample in hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523 p16_DMSO p16_K22 p16_X7523; do # bin/assembly.py gapfill_gap2seq tmp/02_assembly/${sample}.assembly2-scaffolded.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly2-gapfilled.fasta –memLimitGb 12 –maskErrors –randomSeed 0 –loglevel DEBUG # done #IMPORTANT: Reun the following commands! for sample in hCoV229E_Rluc DMSO_p10 K22_p10 X7523_p10 DMSO_p16 K22_p16 X7523_p16 DMSO_p26 K22_p26 X7523_p26; do bin/assembly.py impute_from_reference tmp/02_assembly/${sample}.assembly2-gapfilled.fasta tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta tmp/02_assembly/${sample}.assembly3-modify.fasta –newName ${sample} –replaceLength 55 –minLengthFraction 0.05 –minUnambig 0.05 –index –loglevel DEBUG done # for sample in hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523 p16_DMSO p16_K22 p16_X7523; do # bin/assembly.py refine_assembly tmp/02_assembly/${sample}.assembly3-modify.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly4-refined.fasta –outVcf tmp/02_assembly/${sample}.assembly3.vcf.gz –min_coverage 2 –novo_params ‘-r Random -l 20 -g 40 -x 20 -t 502’ –threads 15 –loglevel DEBUG # bin/assembly.py refine_assembly tmp/02_assembly/${sample}.assembly4-refined.fasta data/01_per_sample/${sample}.cleaned.bam data/02_assembly/${sample}.fasta –outVcf tmp/02_assembly/${sample}.assembly4.vcf.gz –min_coverage 3 –novo_params ‘-r Random -l 20 -g 40 -x 20 -t 100’ –threads 15 –loglevel DEBUG # done # — ! Thirdly set the samples-assembly.txt completely and run “snakemake –directory /work –printshellcmds –cores 40” # —————————- BUG list of the docker pipeline, mostly are due to the version incompability —————————- #BUG_1: FileNotFoundError: [Errno 2] No such file or directory: ‘/home/jhuang/Tools/samtools-1.9/samtools’: ‘/home/jhuang/Tools/samtools-1.9/samtools’ #DEBUG_1 (DEPRECATED): # – In docker install independent samtools conda create -n samtools-1.9-env samtools=1.9 -c bioconda -c conda-forge # – persistence the modified docker, next time run own docker image docker ps #CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES #881a1ad6a990 quay.io/broadinstitute/viral-ngs “bash” 8 minutes ago Up 8 minutes intelligent_yalow docker commit 881a1ad6a990 own_viral_ngs docker image ls docker run -it own_viral_ngs bash #Change the path as “/opt/miniconda/envs/samtools-1.9-env/bin/samtools” in /work/bin/tools/samtools.py # If another tool expect for samtools could not be installed, also use the same method above to install it on own_viral_ngs! #DEBUG_1_BETTER_SIMPLE: TOOL_VERSION = ‘1.6’ –> ‘1.9’ in ~/Tools/viral-ngs_docker/bin/tools/samtools.py #BUG_2: bin/taxon_filter.py deplete data/00_raw/2040_04.bam tmp/01_cleaned/2040_04.raw.bam tmp/01_cleaned/2040_04.bmtagger_depleted.bam tmp/01_cleaned/2040_04.rmdup.bam data/01_cleaned/2040_04.cleaned.bam –bmtaggerDbs /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/hg19 /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3 /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA –blastDbs /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus –threads 15 –srprismMemory 14250 –JVMmemory 50g –loglevel DEBUG #2025-05-23 09:58:45,326 – __init__:445:_attempt_install – DEBUG – Currently installed version of blast: 2.7.1-h4422958_6 #2025-05-23 09:58:45,327 – __init__:448:_attempt_install – DEBUG – Expected version of blast: 2.6.0 #2025-05-23 09:58:45,327 – __init__:449:_attempt_install – DEBUG – Incorrect version of blast installed. Removing it… #DEBUG_2: TOOL_VERSION = “2.6.0” –> “2.7.1” in ~/Tools/viral-ngs_docker/bin/tools/blast.py #BUG_3: bin/read_utils.py bwamem_idxstats data/01_cleaned/1762_04.cleaned.bam /home/jhuang/REFs/viral_ngs_dbs/spikeins/ercc_spike-ins.fasta –outStats reports/spike_count/1762_04.spike_count.txt –minScoreToFilter 60 –loglevel DEBUG #DEBUG_3: TOOL_VERSION = “0.7.15” –> “0.7.17” in ~/Tools/viral-ngs_docker/bin/tools/bwa.py #BUG_4: FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/bin/trimmomatic’: ‘/usr/local/bin/trimmomatic’ #DEBUG_4: TOOL_VERSION = “0.36” –> “0.38” in ~/Tools/viral-ngs_docker/bin/tools/trimmomatic.py #BUG_5: FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/bin/spades.py’: ‘/usr/bin/spades.py’ #DEBUG_5: TOOL_VERSION = “0.36” –> “0.38” in ~/Tools/viral-ngs_docker/bin/tools/trimmomatic.py # def install_and_get_path(self): # # the conda version wraps the jar file with a shell script # return ‘trimmomatic’ #BUG_6: bin/assembly.py order_and_orient tmp/02_assembly/2039_04.assembly1-spades.fasta refsel_db/refsel.fasta tmp/02_assembly/2039_04.assembly2-scaffolded.fasta –min_pct_contig_aligned 0.05 –outAlternateContigs tmp/02_assembly/2039_04.assembly2-alternate_sequences.fasta –nGenomeSegments 1 –outReference tmp/02_assembly/2039_04.assembly2-scaffold_ref.fasta –threads 15 –loglevel DEBUG 2025-05-23 17:40:19,526 – __init__:445:_attempt_install – DEBUG – Currently installed version of mummer4: 4.0.0beta2-pl526hf484d3e_4 2025-05-23 17:40:19,527 – __init__:448:_attempt_install – DEBUG – Expected version of mummer4: 4.0.0rc1 2025-05-23 17:40:19,527 – __init__:449:_attempt_install – DEBUG – Incorrect version of mummer4 installed. Removing it.. DEBUG_6: TOOL_VERSION = “4.0.0rc1” –> “4.0.0beta2” in ~/Tools/viral-ngs_docker/bin/tools/mummer.py #BUG_7: bin/assembly.py order_and_orient tmp/02_assembly/2039_04.assembly1-spades.fasta refsel_db/refsel.fasta tmp/02_assembly/2039_04.assembly2-scaffolded.fasta –min_pct_contig_aligned 0.05 –outAlternateContigs tmp/02_assembly/2039_04.assembly2-alternate_sequences.fasta –nGenomeSegments 1 –outReference tmp/02_assembly/2039_04.assembly2-scaffold_ref.fasta –threads 15 –loglevel DEBUG File “bin/assembly.py”, line 549, in base_counts = [sum([len(seg.seq.replace(“N”, “”)) for seg in scaffold]) \ AttributeError: ‘Seq’ object has no attribute ‘replace’ DEBUG_7: base_counts = [sum([len(seg.seq.replace(“N”, “”)) for seg in scaffold]) –> base_counts = [sum([len(seg.seq.ungap(‘N’)) for seg in scaffold]) in ~/Tools/viral-ngs_docker/bin/assembly.py BUG_8: bin/assembly.py refine_assembly tmp/02_assembly/1243_2.assembly3-modify.fasta data/01_per_sample/1243_2.cleaned.bam tmp/02_assembly/1243_2.assembly4-refined.fasta –outVcf tmp/02_assembly/1243_2.assembly3.vcf.gz –min_coverage 2 –novo_params ‘-r Random -l 20 -g 40 -x 20 -t 502’ –threads 15 –loglevel DEBUG File “/work/bin/tools/gatk.py”, line 75, in execute FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/bin/gatk’: ‘/usr/local/bin/gatk’ #DEBUG_8: -v /usr/local/bin/gatk:/usr/local/bin/gatk in ‘docker run’ and change default python in the script via a shebang; TOOL_VERSION = “3.8” –> “3.6” in ~/Tools/viral-ngs_docker/bin/tools/gatk.py BUG_9: pyyaml is missing! #DEBUG_9: NO_ERROR if rerun! bin/assembly.py impute_from_reference tmp/02_assembly/2039_04.assembly2-gapfilled.fasta tmp/02_assembly/2039_04.assembly2-scaffold_ref.fasta tmp/02_assembly/2039_04.assembly3-modify.fasta –newName 2039_04 –replaceLength 55 –minLengthFraction 0.05 –minUnambig 0.05 –index –loglevel DEBUG for sample in 2039_04 2040_04; do for sample in 1762_04 1243_2 875_04; do bin/assembly.py impute_from_reference tmp/02_assembly/${sample}.assembly2-gapfilled.fasta tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta tmp/02_assembly/${sample}.assembly3-modify.fasta –newName ${sample} –replaceLength 55 –minLengthFraction 0.05 –minUnambig 0.05 –index –loglevel DEBUG done #BUG_10: bin/reports.py consolidate_fastqc reports/fastqc/2039_04/align_to_self reports/fastqc/2040_04/align_to_self reports/fastqc/1762_04/align_to_self reports/fastqc/1243_2/align_to_self reports/fastqc/875_04/align_to_self reports/summary.fastqc.align_to_self.txt #DEBUG_10: File “bin/intrahost.py”, line 527 and line 579 in merge_to_vcf # #MODIFIED_BACK samp_to_seqIndex[sampleName] = seq.seq.ungap(‘-‘) #samp_to_seqIndex[sampleName] = seq.seq.replace(“-“, “”) #BUG_11: bin/interhost.py multichr_mafft ref_genome/reference.fasta data/02_assembly/2039_04.fasta data/02_assembly/2040_04.fasta data/02_assembly/1762_04.fasta data/02_assembly/1243_2.fasta data/02_assembly/875_04.fasta data/03_multialign_to_ref –ep 0.123 –maxiters 1000 –preservecase –localpair –outFilePrefix aligned –sampleNameListFile data/03_multialign_to_ref/sampleNameList.txt –threads 15 –loglevel DEBUG 2025-05-26 15:04:19,014 – cmd:195:main_argparse – INFO – command: bin/interhost.py multichr_mafft inFastas=[‘ref_genome/reference.fasta’, ‘data/02_assembly/2039_04.fasta’, ‘data/02_assembly/2040_04.fasta’, ‘data/02_assembly/1762_04.fasta’, ‘data/02_assembly/1243_2.fasta’, ‘data/02_assembly/875_04.fasta’] localpair=True globalpair=None preservecase=True reorder=None gapOpeningPenalty=1.53 ep=0.123 verbose=False outputAsClustal=None maxiters=1000 outDirectory=data/03_multialign_to_ref outFilePrefix=aligned sampleRelationFile=None sampleNameListFile=data/03_multialign_to_ref/sampleNameList.txt threads=15 loglevel=DEBUG tmp_dir=/tmp tmp_dirKeep=False 2025-05-26 15:04:19,014 – cmd:209:main_argparse – DEBUG – using tempDir: /tmp/tmp-interhost-multichr_mafft-nuws9mhp 2025-05-26 15:04:21,085 – __init__:445:_attempt_install – DEBUG – Currently installed version of mafft: 7.402-0 2025-05-26 15:04:21,085 – __init__:448:_attempt_install – DEBUG – Expected version of mafft: 7.221 2025-05-26 15:04:21,085 – __init__:449:_attempt_install – DEBUG – Incorrect version of mafft installed. Removing it… #DEBUG_11: TOOL_VERSION = “7.221” –> “7.402” in ~/Tools/viral-ngs_docker/bin/tools/mafft.py #BUG_12: bin/interhost.py snpEff data/04_intrahost/isnvs.vcf.gz PP810610.1 data/04_intrahost/isnvs.annot.vcf.gz j.huang@uke.de –loglevel DEBUG 2025-06-10 13:14:07,526 – __init__:445:_attempt_install – DEBUG – Currently installed version of snpeff: 4.3.1t-3 2025-06-10 13:14:07,527 – __init__:448:_attempt_install – DEBUG – Expected version of snpeff: 4.1l #DEBUG_12: -v /usr/local/bin/gatk:/usr/local/bin/gatk in ‘docker run’ and change default python in the script via a shebang; TOOL_VERSION = “4.1l” –> “4.3.1t” in ~/Tools/viral-ngs_docker/bin/tools/snpeff.py 7. Comparing intra- and inter-host variants, comparing the variants to the alignments of the assemblies to confirm its correctness. From the step 5, only 5 inter-host variants were confirmed: they are 10871, 19289, 23435. PP810610 10871 hCoV229E_Rluc hCoV229E_Rluc C,T 0.0057070386810399495 0.011348936781066188 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_DMSO p10_DMSO C,T 0.0577716643741403 0.10886819833916395 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_K22 p10_K22 C,T 1.0 0.0 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_K7523 p10_K7523 C,T 0.8228321896444167 0.2915587546587828 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p16_DMSO p16_DMSO C,T 0.02927088877062267 0.05682820768240093 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p16_K22 p16_K22 C,T 0.9911209766925638 0.017600372505084394 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p16_X7523 p16_X7523 C,T 0.8776699029126214 0.21473088886794223 1.0 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 19289 hCoV229E_Rluc hCoV229E_Rluc G,T 0.0 0.0 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p10_DMSO p10_DMSO G,T 0.0 0.0 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p10_K22 p10_K22 G,T 1.0 0.0 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p10_K7523 p10_K7523 G,T 0.0 0.0 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p16_DMSO p16_DMSO G,T 0.0 0.0 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p16_K22 p16_K22 G,T 0.9884823848238482 0.02276991943361173 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p16_X7523 p16_X7523 G,T 0.0 0.0 1.0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 23435 hCoV229E_Rluc hCoV229E_Rluc C,T 0.0 0.0 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_DMSO p10_DMSO C,T 0.031912415560214305 0.061788026586653055 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_K22 p10_K22 C,T 1.0 0.0 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_K7523 p10_K7523 C,T 0.8352090032154341 0.27526984832663026 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p16_DMSO p16_DMSO C,T 0.0 0.0 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p16_K22 p16_K22 C,T 0.958498023715415 0.07955912449811753 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p16_X7523 p16_X7523 C,T 0.13175164058556285 0.22878629157715102 1.0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 8. Generate variant_annot.xls and coverages.xls sudo chown -R jhuang:jhuang data # — generate isnvs_annot_complete__.txt, isnvs_annot_0.05.txt from ~/DATA/Data_Pietschmann_RSV_Probe3/data/04_intrahost cp isnvs.annot.txt isnvs.annot_complete.txt ~/Tools/csv2xls-0.4/csv_to_xls.py isnvs.annot_complete.txt -d$’\t’ -o isnvs.annot_complete.xls #delete the columns patient, time, Hw and Hs and the header in the xls and save as txt file. awk ‘{printf “%.3f\n”, $5}’ isnvs.annot_complete.csv > f5 cut -f1-4 isnvs.annot_complete.csv > f1_4 cut -f6- isnvs.annot_complete.csv > f6_ paste f1_4 f5 > f1_5 paste f1_5 f6_ > isnvs_annot_complete_.txt #correct f5 in header of isnvs_annot_complete_.txt to iSNV_freq #header: chr pos sample alleles iSNV_freq eff_type eff_codon_dna eff_aa eff_aa_pos eff_prot_len eff_gene eff_protein ~/Tools/csv2xls-0.4/csv_to_xls.py isnvs_annot_complete_.txt -d$’\t’ -o variant_annot.xls #MANUALLY generate variant_annot_0.01.csv variant_annot_0.05.csv awk ‘ $5 >= 0.05 ‘ isnvs_annot_complete_.txt > 0.05.csv cut -f2 0.05.csv awk ‘ $5 >= 0.01 ‘ isnvs_annot_complete_.txt > 0.01.csv cut -f2 0.05.csv | uniq > ids_0.05 cut -f2 0.01.csv | uniq > ids_0.01 #Replace ‘\n’ with ‘\\t” isnvs_annot_complete_.txt >> isnvs_annot_0.05.txt\ngrep -P “PP810610\\t’ in ids_0.05 and then deleting the ‘pos’ line #Replace ‘\n’ with ‘\\t” isnvs_annot_complete_.txt >> isnvs_annot_0.01.txt\ngrep -P “PP810610\\t’ in ids_0.01 and then deleting the ‘pos’ line #Run ids_0.05 and ids_0.01 cp ../../Outputs/Master_vcf/All_SNPs_indels_annotated.txt ../../Outputs/Master_vcf/All_SNPs_indels_annotated.txt hCoV229E_Rluc_variants # Delete the three records which already reported in intra-host results hCoV229E_Rluc_variants: they are 10871, 19289, 23435. PP810610 10871 C T SNP C C/T T C/T C/T T C/T missense_variant MODERATE MISSENSE Ctt/Ttt p.Leu3552Phe/c.10654C>T 6757 CDS_1 protein_coding PP810610 19289 G T SNP G G T G G G/T G missense_variant MODERATE MISSENSE Gtt/Ttt p.Val6358Phe/c.19072G>T 6757 CDS_1 protein_coding PP810610 23435 C T SNP C C T C/T C C/T C/T missense_variant MODERATE MISSENSE Ctt/Ttt p.Leu981Phe/c.2941C>T 1173 CDS_2 protein_coding ~/Tools/csv2xls-0.4/csv_to_xls.py isnvs_annot_0.05.txt isnvs_annot_0.01.txt hCoV229E_Rluc_variants -d$’\t’ -o variant_annot.xls #Modify sheetname to variant_annot_0.05 and variant_annot_0.01 and add the header in Excel file. #Note in the complete list, Set 2024 is NOT a subset of Set 2025 because the element 26283 is in set 2024 but missing from set 2025. # — calculate the coverage samtools depth ./data/02_align_to_self/hCoV229E_Rluc.mapped.bam > hCoV229E_Rluc_cov.txt samtools depth ./data/02_align_to_self/p10_DMSO.mapped.bam > p10_DMSO_cov.txt samtools depth ./data/02_align_to_self/p10_K22.mapped.bam > p10_K22_cov.txt samtools depth ./data/02_align_to_self/p10_K7523.mapped.bam > p10_K7523_cov.txt ~/Tools/csv2xls-0.4/csv_to_xls.py hCoV229E_Rluc_cov.txt p10_DMSO_cov.txt p10_K22_cov.txt p10_K7523_cov.txt -d$’\t’ -o coverages.xls #draw coverage and see if they are continuous? samtools depth ./data/02_align_to_self/p16_DMSO.mapped.bam > p16_DMSO_cov.txt samtools depth ./data/02_align_to_self/p16_K22.mapped.bam > p16_K22_cov.txt samtools depth ./data/02_align_to_self/p16_X7523.mapped.bam > p16_K7523_cov.txt ~/Tools/csv2xls-0.4/csv_to_xls.py p16_DMSO_cov.txt p16_K22_cov.txt p16_K7523_cov.txt -d$’\t’ -o coverages_p16.xls # Load required packages library(ggplot2) library(dplyr) # Read the coverage data cov_data <- read.table("p16_K7523_cov.txt", header = FALSE, sep = "\t", col.names = c("Chromosome", "Position", "Coverage")) # Create full position range for the given chromosome full_range <- data.frame(Position = seq(min(cov_data$Position), max(cov_data$Position))) # Merge with actual coverage data and fill missing positions with 0 cov_full % left_join(cov_data[, c(“Position”, “Coverage”)], by = “Position”) %>% mutate(Coverage = ifelse(is.na(Coverage), 0, Coverage)) # Save the plot to PNG png(“p16_K7523_coverage_filled.png”, width = 1200, height = 600) ggplot(cov_full, aes(x = Position, y = Coverage)) + geom_line(color = “steelblue”, size = 0.3) + labs(title = “Coverage Plot for p16_K7523 (Missing = 0)”, x = “Genomic Position”, y = “Coverage Depth”) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5), axis.text = element_text(size = 10), axis.title = element_text(size = 12) ) dev.off() 9. (Optional) Consensus sequences of each and of all isolates cat PP810610.1.fa OZ035258.1.fa MZ712010.1.fa OK662398.1.fa OK625404.1.fa KF293664.1.fa NC_002645.1.fa > all.fa cp data/02_assembly/*.fasta ./ for sample in hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523; do \ mv ${sample}.fasta ${sample}.fa cat all.fa ${sample}.fa >> all.fa done cat RSV_dedup.fa all.fa > RSV_all.fa mafft –clustalout –adjustdirection RSV_all.fa > RSV_all.aln snp-sites RSV_all.aln -o RSV_all_.aln 10. Report Please find attached the variant analysis results for Thomas. Variant frequencies in the new samples are highlighted in yellow. Although PP810610 is used as the reference, only differences observed in the samples p10_DMSO, p10_K22, p10_K7523, p16_DMSO, p16_K22, and p16_X7523 compared to hCoV229E_Rluc are reported in the sheets variant_annot_0.05 and variant_annot_0.01 (see variant_annot.xls). Variants already present in hCoV229E_Rluc are excluded from these sheets. In total, 17 mutations were found in hCoV229E_Rluc relative to PP810610, detailed in the sheet “hCoV229E_Rluc_variants” (see variant_annot.xls). —— Explanation of iSNV_freq in the sheets variant_annot_0.05 and variant_annot_0.01 —— The iSNV_freq column shows the frequency of the second allele at each position. For example, at position 23435 on chr PP810610: chr Position Sample Alleles iSNV_freq PP810610 23435 hCoV229E_Rluc C,T 0 PP810610 23435 p10_DMSO C,T 0.032 PP810610 23435 p10_K22 C,T 0.995 PP810610 23435 p10_K7523 C,T 0.835 PP810610 23435 p16_DMSO C,T 0 PP810610 23435 p16_K22 C,T 0.958 PP810610 23435 p16_X7523 C,T 0.132 The second allele (T) frequencies are: 0 (only C) 0.032 (3.2% T) 0.995 (99.5% T) 0.835 (83.5% T) 0 (only C) 0.958 (95.8% T) 0.132 (13.2% T) # —- Explanation of Mutation at Position 19289 —- Regarding the mutation at position 19289 — you’re absolutely right, and I had also noticed the discrepancy. In the 2024 analysis, I performed intra-host variant calling, which detects only those variants with frequencies strictly between 0% and 100% within a single sample. Since position 19289 showed 100% G in p10_DMSO, 100% T in p10_K22, and 100% G in p10_K7523, it was not identified as an intra-host variant at that time. Rather, it’s a clear example of an inter-host variant — a fixed difference between samples. In the 2025 analysis, I again used intra-host variant calling. This time, the mutation at position 19289 in p16_K22 was detected at 98.8% T, which falls within the threshold and therefore appears in the intra-host variant table. After noticing this, I also ran a dedicated inter-host variant calling analysis, which specifically highlights differences between samples rather than within them. The results can be found in the third table (“hCoV229E_Rluc_variants”) of the variant_annot.xls file I sent you previously. As you’ll see, all 17 positions are identical across the 7 samples, indicating that no additional inter-host variants were detected beyond what we had already observed. Lastly, please find the coverage data in the attached files. # — Just following up on the mutation at position 19289. By tweaking some settings in the inter-host variant calling, we can also detect variants at positions like 19289. However, in these results, a “/” indicates intra-host variants that require further validation through intra-host variant calling. The intra-host variant calling uses a more precise mapping strategy, enabling a more accurate estimation of allele frequencies. Here’s an example from the inter-host variant table showing the mutation at 19289 with the adjusted settings: CHROM POS REF ALT TYPE hCoV229E_Rluc p10_DMSO p10_K22 p10_K7523 p16_DMSO p16_K22 p16_X7523 PP810610 19289 G T SNP G G T G G G/T G # —————————————— END —————————————- #Check if the 0.05 and 0.01 are superset of 0.05 and 0.01 of 2024 version: comparing ‘cut -f2 0.05.csv | uniq > ids_0.05_’ and ‘cut -f2 0.01.csv | uniq > ids_0.01_’ between 2024 and 2025 869, 1492, 4809, 5797, 8289, 8294, 8331, 8376, 9146, 9174, 9933, 9954, 9993, 10145, 10239, 10310, 10871, 10898, 10970, 11577, 12634, 17941, 18640, 18646, 18701, 18815, 19028, 19294, 19388, 21027, 21633, 21671, 21928, 22215, 23435, 23633, 24738, 25025, 25592, 26885 869, 1492, 3422, 4074, 4809, 5345, 5373, 5543, 5797, 6470, 8289, 8294, 8331, 8376, 9146, 9174, 9261, 9933, 9954, 9993, 10145, 10239, 10310, 10871, 10898, 10970, 11194, 11568, 11577, 11706, 12634, 13113, 13912, 15615, 17941, 18640, 18646, 18701, 18815, 18919, 19028, 19165, 19289, 19294, 19388, 21027, 21633, 21671, 21747, 21928, 22215, 22318, 22630, 22788, 22820, 22906, 22918, 23435, 23586, 23633, 24738, 24903, 25025, 25432, 25592, 26104, 26281, 26307, 26411, 26500, 26746, 26885 ✅ Ja, Set 1 ist eine Teilmenge von Set 2. Alle Elemente von Set 1 sind auch in Set 2 enthalten. Set 1: {1492,8289,8294,9174,10239,10310,10871,10898,11577,18640,21027,21633,22215,23435,24738,25025,25592} Set 2: {1492,8289,8294,9174,10145,10239,10310,10871,10898,11577,18640,19289,21027,21633,22215,23435,24738,25025,25592} Since every element of Set 1 is in Set 2, we have: Set 1 ⊆ Set 2 In other words, Set 1 is a subset of Set 2. diff 0.05_test_uniq.txt 0.05_test.csv diff 0.01_test_uniq.txt 0.01_test.csv > chr pos sample alleles iSNV_freq eff_type eff_codon_dna eff_aa eff_aa_pos eff_prot_len eff_gene eff_protein 8a10,17 > PP810610 3422 hCoV229E_Rluc C,T 0 missense_variant 3205C>T Leu1069Phe 1069 6758 Gene_217_20492 XBA84229.1 > PP810610 3422 p10_DMSO C,T 0 missense_variant 3205C>T Leu1069Phe 1069 6758 Gene_217_20492 XBA84229.1 > PP810610 3422 p10_K22 C,T 0 missense_variant 3205C>T Leu1069Phe 1069 6758 Gene_217_20492 XBA84229.1 > PP810610 3422 p10_K7523 C,T 0 missense_variant 3205C>T Leu1069Phe 1069 6758 Gene_217_20492 XBA84229.1 > PP810610 4074 hCoV229E_Rluc G,T 0 missense_variant 3857G>T Gly1286Val 1286 6758 Gene_217_20492 XBA84229.1 > PP810610 4074 p10_DMSO G,T 0 missense_variant 3857G>T Gly1286Val 1286 6758 Gene_217_20492 XBA84229.1 > PP810610 4074 p10_K22 G,T 0 missense_variant 3857G>T Gly1286Val 1286 6758 Gene_217_20492 XBA84229.1 > PP810610 4074 p10_K7523 G,T 0 missense_variant 3857G>T Gly1286Val 1286 6758 Gene_217_20492 XBA84229.1 12a22,33 > PP810610 5345 hCoV229E_Rluc C,T 0 synonymous_variant 5128C>T Leu1710Leu 1710 6758 Gene_217_20492 XBA84229.1 > PP810610 5345 p10_DMSO C,T 0 synonymous_variant 5128C>T Leu1710Leu 1710 6758 Gene_217_20492 XBA84229.1 > PP810610 5345 p10_K22 C,T 0 synonymous_variant 5128C>T Leu1710Leu 1710 6758 Gene_217_20492 XBA84229.1 > PP810610 5345 p10_K7523 C,T 0 synonymous_variant 5128C>T Leu1710Leu 1710 6758 Gene_217_20492 XBA84229.1 > PP810610 5373 hCoV229E_Rluc C,A 0 stop_gained 5156C>A Ser1719* 1719 6758 Gene_217_20492 XBA84229.1 > PP810610 5373 p10_DMSO C,A 0 stop_gained 5156C>A Ser1719* 1719 6758 Gene_217_20492 XBA84229.1 > PP810610 5373 p10_K22 C,A 0 stop_gained 5156C>A Ser1719* 1719 6758 Gene_217_20492 XBA84229.1 > PP810610 5373 p10_K7523 C,A 0 stop_gained 5156C>A Ser1719* 1719 6758 Gene_217_20492 XBA84229.1 > PP810610 5543 hCoV229E_Rluc C,T 0 missense_variant 5326C>T His1776Tyr 1776 6758 Gene_217_20492 XBA84229.1 > PP810610 5543 p10_DMSO C,T 0 missense_variant 5326C>T His1776Tyr 1776 6758 Gene_217_20492 XBA84229.1 > PP810610 5543 p10_K22 C,T 0 missense_variant 5326C>T His1776Tyr 1776 6758 Gene_217_20492 XBA84229.1 > PP810610 5543 p10_K7523 C,T 0 missense_variant 5326C>T His1776Tyr 1776 6758 Gene_217_20492 XBA84229.1 16a38,41 > PP810610 6470 hCoV229E_Rluc C,T 0 synonymous_variant 6253C>T Leu2085Leu 2085 6758 Gene_217_20492 XBA84229.1 > PP810610 6470 p10_DMSO C,T 0 synonymous_variant 6253C>T Leu2085Leu 2085 6758 Gene_217_20492 XBA84229.1 > PP810610 6470 p10_K22 C,T 0 synonymous_variant 6253C>T Leu2085Leu 2085 6758 Gene_217_20492 XBA84229.1 > PP810610 6470 p10_K7523 C,T 0 synonymous_variant 6253C>T Leu2085Leu 2085 6758 Gene_217_20492 XBA84229.1 40a66,69 > PP810610 9261 hCoV229E_Rluc C,T 0 missense_variant 9044C>T Ala3015Val 3015 6758 Gene_217_20492 XBA84229.1 > PP810610 9261 p10_DMSO C,T 0 missense_variant 9044C>T Ala3015Val 3015 6758 Gene_217_20492 XBA84229.1 > PP810610 9261 p10_K22 C,T 0 missense_variant 9044C>T Ala3015Val 3015 6758 Gene_217_20492 XBA84229.1 > PP810610 9261 p10_K7523 C,T 0 missense_variant 9044C>T Ala3015Val 3015 6758 Gene_217_20492 XBA84229.1 *1* 72c101 A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 (OLD calculation,take this to integrate) — > PP810610 10898 p10_K7523 G,A 0.062 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 (NEW calculation) 76a106,113 > PP810610 11194 hCoV229E_Rluc C,A 0 synonymous_variant 10977C>A Ser3659Ser 3659 6758 Gene_217_20492 XBA84229.1 > PP810610 11194 p10_DMSO C,A 0 synonymous_variant 10977C>A Ser3659Ser 3659 6758 Gene_217_20492 XBA84229.1 > PP810610 11194 p10_K22 C,A 0 synonymous_variant 10977C>A Ser3659Ser 3659 6758 Gene_217_20492 XBA84229.1 > PP810610 11194 p10_K7523 C,A 0 synonymous_variant 10977C>A Ser3659Ser 3659 6758 Gene_217_20492 XBA84229.1 > PP810610 11568 hCoV229E_Rluc C,T 0 missense_variant 11351C>T Thr3784Ile 3784 6758 Gene_217_20492 XBA84229.1 > PP810610 11568 p10_DMSO C,T 0 missense_variant 11351C>T Thr3784Ile 3784 6758 Gene_217_20492 XBA84229.1 > PP810610 11568 p10_K22 C,T 0 missense_variant 11351C>T Thr3784Ile 3784 6758 Gene_217_20492 XBA84229.1 > PP810610 11568 p10_K7523 C,T 0 missense_variant 11351C>T Thr3784Ile 3784 6758 Gene_217_20492 XBA84229.1 80a118,121 > PP810610 11706 hCoV229E_Rluc C,A 0 missense_variant 11489C>A Pro3830Gln 3830 6758 Gene_217_20492 XBA84229.1 > PP810610 11706 p10_DMSO C,A 0 missense_variant 11489C>A Pro3830Gln 3830 6758 Gene_217_20492 XBA84229.1 > PP810610 11706 p10_K22 C,A 0 missense_variant 11489C>A Pro3830Gln 3830 6758 Gene_217_20492 XBA84229.1 > PP810610 11706 p10_K7523 C,A 0 missense_variant 11489C>A Pro3830Gln 3830 6758 Gene_217_20492 XBA84229.1 84a126,137 > PP810610 13113 hCoV229E_Rluc C,T 0 synonymous_variant 12897C>T Tyr4299Tyr 4299 6758 Gene_217_20492 XBA84229.1 > PP810610 13113 p10_DMSO C,T 0 synonymous_variant 12897C>T Tyr4299Tyr 4299 6758 Gene_217_20492 XBA84229.1 > PP810610 13113 p10_K22 C,T 0 synonymous_variant 12897C>T Tyr4299Tyr 4299 6758 Gene_217_20492 XBA84229.1 > PP810610 13113 p10_K7523 C,T 0 synonymous_variant 12897C>T Tyr4299Tyr 4299 6758 Gene_217_20492 XBA84229.1 > PP810610 13912 hCoV229E_Rluc G,A 0 missense_variant 13696G>A Gly4566Ser 4566 6758 Gene_217_20492 XBA84229.1 > PP810610 13912 p10_DMSO G,A 0 missense_variant 13696G>A Gly4566Ser 4566 6758 Gene_217_20492 XBA84229.1 > PP810610 13912 p10_K22 G,A 0 missense_variant 13696G>A Gly4566Ser 4566 6758 Gene_217_20492 XBA84229.1 > PP810610 13912 p10_K7523 G,A 0 missense_variant 13696G>A Gly4566Ser 4566 6758 Gene_217_20492 XBA84229.1 > PP810610 15615 hCoV229E_Rluc C,A 0 synonymous_variant 15399C>A Val5133Val 5133 6758 Gene_217_20492 XBA84229.1 > PP810610 15615 p10_DMSO C,A 0 synonymous_variant 15399C>A Val5133Val 5133 6758 Gene_217_20492 XBA84229.1 > PP810610 15615 p10_K22 C,A 0 synonymous_variant 15399C>A Val5133Val 5133 6758 Gene_217_20492 XBA84229.1 > PP810610 15615 p10_K7523 C,A 0 synonymous_variant 15399C>A Val5133Val 5133 6758 Gene_217_20492 XBA84229.1 104a158,161 > PP810610 18919 hCoV229E_Rluc C,T 0 missense_variant 18703C>T Arg6235Cys 6235 6758 Gene_217_20492 XBA84229.1 > PP810610 18919 p10_DMSO C,T 0 missense_variant 18703C>T Arg6235Cys 6235 6758 Gene_217_20492 XBA84229.1 > PP810610 18919 p10_K22 C,T 0 missense_variant 18703C>T Arg6235Cys 6235 6758 Gene_217_20492 XBA84229.1 > PP810610 18919 p10_K7523 C,T 0 missense_variant 18703C>T Arg6235Cys 6235 6758 Gene_217_20492 XBA84229.1 108a166,173 > PP810610 19165 hCoV229E_Rluc C,A 0 missense_variant 18949C>A Arg6317Ser 6317 6758 Gene_217_20492 XBA84229.1 > PP810610 19165 p10_DMSO C,A 0 missense_variant 18949C>A Arg6317Ser 6317 6758 Gene_217_20492 XBA84229.1 > PP810610 19165 p10_K22 C,A 0 missense_variant 18949C>A Arg6317Ser 6317 6758 Gene_217_20492 XBA84229.1 > PP810610 19165 p10_K7523 C,A 0 missense_variant 18949C>A Arg6317Ser 6317 6758 Gene_217_20492 XBA84229.1 > PP810610 19289 hCoV229E_Rluc G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 > PP810610 19289 p10_DMSO G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 > PP810610 19289 p10_K22 G,T 1 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 > PP810610 19289 p10_K7523 G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 128a194,197 > PP810610 21747 hCoV229E_Rluc C,A 0 missense_variant 1253C>A Ser418Tyr 418 1173 Gene_20494_24015 XBA84230.1 > PP810610 21747 p10_DMSO C,A 0 missense_variant 1253C>A Ser418Tyr 418 1173 Gene_20494_24015 XBA84230.1 > PP810610 21747 p10_K22 C,A 0 missense_variant 1253C>A Ser418Tyr 418 1173 Gene_20494_24015 XBA84230.1 > PP810610 21747 p10_K7523 C,A 0 missense_variant 1253C>A Ser418Tyr 418 1173 Gene_20494_24015 XBA84230.1 *2* 131c200 C Gly478Gly 478 1173 Gene_20494_24015 XBA84230.1 — > PP810610 21928 p10_K22 T,C 0.029 synonymous_variant 1434T>C Gly478Gly 478 1173 Gene_20494_24015 XBA84230.1 140a210,229 > PP810610 22630 hCoV229E_Rluc C,T 0 synonymous_variant 2136C>T Tyr712Tyr 712 1173 Gene_20494_24015 XBA84230.1 > PP810610 22630 p10_DMSO C,T 0 synonymous_variant 2136C>T Tyr712Tyr 712 1173 Gene_20494_24015 XBA84230.1 > PP810610 22630 p10_K22 C,T 0 synonymous_variant 2136C>T Tyr712Tyr 712 1173 Gene_20494_24015 XBA84230.1 > PP810610 22630 p10_K7523 C,T 0 synonymous_variant 2136C>T Tyr712Tyr 712 1173 Gene_20494_24015 XBA84230.1 > PP810610 22788 hCoV229E_Rluc T,C 0 missense_variant 2294T>C Val765Ala 765 1173 Gene_20494_24015 XBA84230.1 > PP810610 22788 p10_DMSO T,C 0 missense_variant 2294T>C Val765Ala 765 1173 Gene_20494_24015 XBA84230.1 > PP810610 22788 p10_K22 T,C 0 missense_variant 2294T>C Val765Ala 765 1173 Gene_20494_24015 XBA84230.1 > PP810610 22788 p10_K7523 T,C 0 missense_variant 2294T>C Val765Ala 765 1173 Gene_20494_24015 XBA84230.1 > PP810610 22820 hCoV229E_Rluc C,T 0 missense_variant 2326C>T Arg776Cys 776 1173 Gene_20494_24015 XBA84230.1 > PP810610 22820 p10_DMSO C,T 0 missense_variant 2326C>T Arg776Cys 776 1173 Gene_20494_24015 XBA84230.1 > PP810610 22820 p10_K22 C,T 0 missense_variant 2326C>T Arg776Cys 776 1173 Gene_20494_24015 XBA84230.1 > PP810610 22820 p10_K7523 C,T 0 missense_variant 2326C>T Arg776Cys 776 1173 Gene_20494_24015 XBA84230.1 > PP810610 22906 hCoV229E_Rluc C,T 0 synonymous_variant 2412C>T Asn804Asn 804 1173 Gene_20494_24015 XBA84230.1 > PP810610 22906 p10_DMSO C,T 0 synonymous_variant 2412C>T Asn804Asn 804 1173 Gene_20494_24015 XBA84230.1 > PP810610 22906 p10_K22 C,T 0 synonymous_variant 2412C>T Asn804Asn 804 1173 Gene_20494_24015 XBA84230.1 > PP810610 22906 p10_K7523 C,T 0 synonymous_variant 2412C>T Asn804Asn 804 1173 Gene_20494_24015 XBA84230.1 > PP810610 22918 hCoV229E_Rluc C,A 0 synonymous_variant 2424C>A Ala808Ala 808 1173 Gene_20494_24015 XBA84230.1 > PP810610 22918 p10_DMSO C,A 0 synonymous_variant 2424C>A Ala808Ala 808 1173 Gene_20494_24015 XBA84230.1 > PP810610 22918 p10_K22 C,A 0 synonymous_variant 2424C>A Ala808Ala 808 1173 Gene_20494_24015 XBA84230.1 > PP810610 22918 p10_K7523 C,A 0 synonymous_variant 2424C>A Ala808Ala 808 1173 Gene_20494_24015 XBA84230.1 *3* 143c232 T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 — > PP810610 23435 p10_K22 C,T 1 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 144a234,237 > PP810610 23586 hCoV229E_Rluc C,A 0 missense_variant 3092C>A Pro1031Gln 1031 1173 Gene_20494_24015 XBA84230.1 > PP810610 23586 p10_DMSO C,A 0 missense_variant 3092C>A Pro1031Gln 1031 1173 Gene_20494_24015 XBA84230.1 > PP810610 23586 p10_K22 C,A 0 missense_variant 3092C>A Pro1031Gln 1031 1173 Gene_20494_24015 XBA84230.1 > PP810610 23586 p10_K7523 C,A 0 missense_variant 3092C>A Pro1031Gln 1031 1173 Gene_20494_24015 XBA84230.1 ‘-minus this record-‘ 149,152c242,249 T,64C>A Leu22Phe,Leu22Ile 22 77 Gene_24674_24907 XBA84233.1 T,64C>A Leu22Phe,Leu22Ile 22 77 Gene_24674_24907 XBA84233.1 T,64C>A Leu22Phe,Leu22Ile 22 77 Gene_24674_24907 XBA84233.1 T,64C>A Leu22Phe,Leu22Ile 22 77 Gene_24674_24907 XBA84233.1 — > PP810610 24738 hCoV229E_Rluc C,A,T 0 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 > PP810610 24738 p10_DMSO C,A,T 0.011 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 > PP810610 24738 p10_K22 C,A,T 1 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 > PP810610 24738 p10_K7523 C,A,T 0.106 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 > PP810610 24903 hCoV229E_Rluc T,C 0 missense_variant 229T>C Phe77Leu 77 77 Gene_24674_24907 XBA84233.1 > PP810610 24903 p10_DMSO T,C 0 missense_variant 229T>C Phe77Leu 77 77 Gene_24674_24907 XBA84233.1 > PP810610 24903 p10_K22 T,C 0 missense_variant 229T>C Phe77Leu 77 77 Gene_24674_24907 XBA84233.1 > PP810610 24903 p10_K7523 T,C 0 missense_variant 229T>C Phe77Leu 77 77 Gene_24674_24907 XBA84233.1 *4* 154c251 T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 — > PP810610 25025 p10_DMSO C,T 0.049 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 156a254,257 > PP810610 25432 hCoV229E_Rluc C,A 0 synonymous_variant 513C>A Ala171Ala 171 225 Gene_24919_25596 XBA84234.1 > PP810610 25432 p10_DMSO C,A 0 synonymous_variant 513C>A Ala171Ala 171 225 Gene_24919_25596 XBA84234.1 > PP810610 25432 p10_K22 C,A 0 synonymous_variant 513C>A Ala171Ala 171 225 Gene_24919_25596 XBA84234.1 > PP810610 25432 p10_K7523 C,A 0 synonymous_variant 513C>A Ala171Ala 171 225 Gene_24919_25596 XBA84234.1 161,164c262,289 < PP810610 26885 hCoV229E_Rluc T,A 0 intergenic_region < PP810610 26885 p10_DMSO T,A 0 intergenic_region < PP810610 26885 p10_K22 T,A 0.009 intergenic_region PP810610 26104 hCoV229E_Rluc C,T,A 0 missense_variant 494C>A,494C>T Pro165His,Pro165Leu 165 389 Gene_25610_26779 XBA84235.1 > PP810610 26104 p10_DMSO C,T,A 0 missense_variant 494C>A,494C>T Pro165His,Pro165Leu 165 389 Gene_25610_26779 XBA84235.1 > PP810610 26104 p10_K22 C,T,A 0 missense_variant 494C>A,494C>T Pro165His,Pro165Leu 165 389 Gene_25610_26779 XBA84235.1 > PP810610 26104 p10_K7523 C,T,A 0 missense_variant 494C>A,494C>T Pro165His,Pro165Leu 165 389 Gene_25610_26779 XBA84235.1 > PP810610 26281 hCoV229E_Rluc C,T 0 missense_variant 671C>T Thr224Ile 224 389 Gene_25610_26779 XBA84235.1 > PP810610 26281 p10_DMSO C,T 0 missense_variant 671C>T Thr224Ile 224 389 Gene_25610_26779 XBA84235.1 > PP810610 26281 p10_K22 C,T 0 missense_variant 671C>T Thr224Ile 224 389 Gene_25610_26779 XBA84235.1 > PP810610 26281 p10_K7523 C,T 0 missense_variant 671C>T Thr224Ile 224 389 Gene_25610_26779 XBA84235.1 > PP810610 26307 hCoV229E_Rluc C,A 0 missense_variant 697C>A Gln233Lys 233 389 Gene_25610_26779 XBA84235.1 > PP810610 26307 p10_DMSO C,A 0 missense_variant 697C>A Gln233Lys 233 389 Gene_25610_26779 XBA84235.1 > PP810610 26307 p10_K22 C,A 0 missense_variant 697C>A Gln233Lys 233 389 Gene_25610_26779 XBA84235.1 > PP810610 26307 p10_K7523 C,A 0 missense_variant 697C>A Gln233Lys 233 389 Gene_25610_26779 XBA84235.1 > PP810610 26411 hCoV229E_Rluc C,A 0 synonymous_variant 801C>A Pro267Pro 267 389 Gene_25610_26779 XBA84235.1 > PP810610 26411 p10_DMSO C,A 0 synonymous_variant 801C>A Pro267Pro 267 389 Gene_25610_26779 XBA84235.1 > PP810610 26411 p10_K22 C,A 0 synonymous_variant 801C>A Pro267Pro 267 389 Gene_25610_26779 XBA84235.1 > PP810610 26411 p10_K7523 C,A 0 synonymous_variant 801C>A Pro267Pro 267 389 Gene_25610_26779 XBA84235.1 > PP810610 26500 hCoV229E_Rluc C,A 0 missense_variant 890C>A Pro297Gln 297 389 Gene_25610_26779 XBA84235.1 > PP810610 26500 p10_DMSO C,A 0 missense_variant 890C>A Pro297Gln 297 389 Gene_25610_26779 XBA84235.1 > PP810610 26500 p10_K22 C,A 0 missense_variant 890C>A Pro297Gln 297 389 Gene_25610_26779 XBA84235.1 > PP810610 26500 p10_K7523 C,A 0 missense_variant 890C>A Pro297Gln 297 389 Gene_25610_26779 XBA84235.1 > PP810610 26746 hCoV229E_Rluc C,A 0 missense_variant 1136C>A Ser379Tyr 379 389 Gene_25610_26779 XBA84235.1 > PP810610 26746 p10_DMSO C,A 0 missense_variant 1136C>A Ser379Tyr 379 389 Gene_25610_26779 XBA84235.1 > PP810610 26746 p10_K22 C,A 0 missense_variant 1136C>A Ser379Tyr 379 389 Gene_25610_26779 XBA84235.1 > PP810610 26746 p10_K7523 C,A 0 missense_variant 1136C>A Ser379Tyr 379 389 Gene_25610_26779 XBA84235.1 > PP810610 26885 hCoV229E_Rluc T,A 0 intergenic_region n.26885T>A Gene_25610_26779-CHR_END Gene_25610_26779-CHR_END > PP810610 26885 p10_DMSO T,A 0 intergenic_region n.26885T>A Gene_25610_26779-CHR_END Gene_25610_26779-CHR_END > PP810610 26885 p10_K22 T,A 0.009 intergenic_region n.26885T>A Gene_25610_26779-CHR_END Gene_25610_26779-CHR_END > PP810610 26885 p10_K7523 T,A 0.011 intergenic_region n.26885T>A Gene_25610_26779-CHR_END Gene_25610_26779-CHR_END TODOs: Schnaps-Idee: we can organize the results with a additional column 2025, so at the end: #chr pos n.a. alleles iSNV_freq eff_type eff_codon_dna eff_aa eff_aa_pos eff_prot_len eff_gene eff_protein #chr pos sample2024 sample2025 alleles iSNV_freq eff_type eff_codon_dna eff_aa eff_aa_pos eff_prot_len eff_gene eff_protein hCoV229E_Rluc that means, we need to delete the SNP results of 2025 for hCoV229E_Rluc, p10_DMSO, p10_K22, p10_K7523. we have only three new samples of data. If the SNP ist complete new in 2025, the 2024 data should be all ‘0’ For the please generate the report according to the SNP-comparison between 2024 and 2025: !!!!TODO_TOMORROW!!!!: 1. Using the following report, however copy the results of 2024 to new table so that we can unify the results! Marking all new added results yellow. 2. If the SNP ist complete new in 2025, the 2024 data should be all ‘0’, should all 7 mark yellow. 3. One for 0.01 and one for 0.05, in this way, we can also present the results 2024_0.01. 4. Copy the pipeline process to xgenes.com! 2024: chr pos sample alleles iSNV_freq eff_type eff_codon_dna eff_aa eff_aa_pos eff_prot_len eff_gene eff_protein PP810610 1492 hCoV229E_Rluc T,A 0.207 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p10_DMSO T,A 0.081 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p10_K22 T,A 0.854 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p10_K7523 T,A 0.229 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 8289 hCoV229E_Rluc C,A 0.325 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p10_DMSO C,A 0.028 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p10_K22 C,A 0 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p10_K7523 C,A 0.831 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8294 hCoV229E_Rluc A,G 0.179 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p10_DMSO A,G 0.024 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p10_K22 A,G 0.074 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p10_K7523 A,G 0 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 9174 hCoV229E_Rluc G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p10_DMSO G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p10_K22 G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p10_K7523 G,A 0.066 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 10239 hCoV229E_Rluc T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p10_DMSO T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p10_K22 T,G 0.055 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p10_K7523 T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10310 hCoV229E_Rluc G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p10_DMSO G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p10_K22 G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p10_K7523 G,A 0.156 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10871 hCoV229E_Rluc C,T 0.006 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_DMSO C,T 0.058 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_K22 C,T 1 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_K7523 C,T 0.823 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10898 hCoV229E_Rluc G,A 0.012 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p10_DMSO G,A 0.036 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p10_K22 G,A 0 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p10_K7523 G,A 0.064 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 11577 hCoV229E_Rluc A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p10_DMSO A,C 0.184 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p10_K22 A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p10_K7523 A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 18640 hCoV229E_Rluc T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p10_DMSO T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p10_K22 T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p10_K7523 T,G 0.055 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 21027 hCoV229E_Rluc C,T 0 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p10_DMSO C,T 0.186 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p10_K22 C,T 0 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p10_K7523 C,T 0.032 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 hCoV229E_Rluc T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p10_DMSO T,C 0.08 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p10_K22 T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p10_K7523 T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 hCoV229E_Rluc T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p10_DMSO T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p10_K22 T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p10_K7523 T,G 0.078 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 hCoV229E_Rluc C,T 0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_DMSO C,T 0.032 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_K22 C,T 0.995 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_K7523 C,T 0.835 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 25592 hCoV229E_Rluc T,C 0.012 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p10_DMSO T,C 0.925 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p10_K22 T,C 0 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p10_K7523 T,C 0 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 2025: chr pos sample alleles iSNV_freq eff_type eff_codon_dna eff_aa eff_aa_pos eff_prot_len eff_gene eff_protein PP810610 1492 hCoV229E_Rluc T,A 0.207 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p10_DMSO T,A 0.081 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p10_K22 T,A 0.854 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p10_K7523 T,A 0.229 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p16_DMSO T,A 0.043 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p16_K22 T,A 0.893 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 1492 p16_X7523 T,A 0.179 synonymous_variant 1275T>A Thr425Thr 425 6758 Gene_217_20492 XBA84229.1 PP810610 8289 hCoV229E_Rluc C,A 0.325 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p10_DMSO C,A 0.028 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p10_K22 C,A 0 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p10_K7523 C,A 0.831 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p16_DMSO C,A 0 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p16_K22 C,A 0 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8289 p16_X7523 C,A 0.226 missense_variant 8072C>A Ala2691Asp 2691 6758 Gene_217_20492 XBA84229.1 PP810610 8294 hCoV229E_Rluc A,G 0.179 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p10_DMSO A,G 0.024 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p10_K22 A,G 0.074 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p10_K7523 A,G 0 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p16_DMSO A,G 0 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p16_K22 A,G 0.145 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 8294 p16_X7523 A,G 0 missense_variant 8077A>G Lys2693Glu 2693 6758 Gene_217_20492 XBA84229.1 PP810610 9174 hCoV229E_Rluc G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p10_DMSO G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p10_K22 G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p10_K7523 G,A 0.066 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p16_DMSO G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p16_K22 G,A 0 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 9174 p16_X7523 G,A 0.025 missense_variant 8957G>A Cys2986Tyr 2986 6758 Gene_217_20492 XBA84229.1 PP810610 10145 hCoV229E_Rluc A,G 0 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10145 p10_DMSO A,G 0 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10145 p10_K22 A,G 0 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10145 p10_K7523 A,G 0.045 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10145 p16_DMSO A,G 0 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10145 p16_K22 A,G 0 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10145 p16_X7523 A,G 0.064 missense_variant 9928A>G Met3310Val 3310 6758 Gene_217_20492 XBA84229.1 PP810610 10239 hCoV229E_Rluc T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p10_DMSO T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p10_K22 T,G 0.055 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p10_K7523 T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p16_DMSO T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p16_K22 T,G 0.08 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10239 p16_X7523 T,G 0 missense_variant 10022T>G Val3341Gly 3341 6758 Gene_217_20492 XBA84229.1 PP810610 10310 hCoV229E_Rluc G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p10_DMSO G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p10_K22 G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p10_K7523 G,A 0.156 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p16_DMSO G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p16_K22 G,A 0 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10310 p16_X7523 G,A 0.091 missense_variant 10093G>A Val3365Ile 3365 6758 Gene_217_20492 XBA84229.1 PP810610 10871 hCoV229E_Rluc C,T 0.006 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_DMSO C,T 0.058 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_K22 C,T 1 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p10_K7523 C,T 0.823 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p16_DMSO C,T 0.029 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p16_K22 C,T 0.991 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10871 p16_X7523 C,T 0.878 missense_variant 10654C>T Leu3552Phe 3552 6758 Gene_217_20492 XBA84229.1 PP810610 10898 hCoV229E_Rluc G,A 0.012 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p10_DMSO G,A 0.036 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p10_K22 G,A 0 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p10_K7523 G,A 0.062 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p16_DMSO G,A 0.018 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p16_K22 G,A 0 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 10898 p16_X7523 G,A 0.044 missense_variant 10681G>A Gly3561Ser 3561 6758 Gene_217_20492 XBA84229.1 PP810610 11577 hCoV229E_Rluc A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p10_DMSO A,C 0.184 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p10_K22 A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p10_K7523 A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p16_DMSO A,C 0.946 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p16_K22 A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 11577 p16_X7523 A,C 0 missense_variant 11360A>C Glu3787Ala 3787 6758 Gene_217_20492 XBA84229.1 PP810610 18640 hCoV229E_Rluc T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p10_DMSO T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p10_K22 T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p10_K7523 T,G 0.055 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p16_DMSO T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p16_K22 T,G 0 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 18640 p16_X7523 T,G 0.183 missense_variant 18424T>G Phe6142Val 6142 6758 Gene_217_20492 XBA84229.1 PP810610 19289 hCoV229E_Rluc G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p10_DMSO G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p10_K22 G,T 1 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p10_K7523 G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p16_DMSO G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p16_K22 G,T 0.988 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 19289 p16_X7523 G,T 0 missense_variant 19073G>T Gly6358Val 6358 6758 Gene_217_20492 XBA84229.1 PP810610 21027 hCoV229E_Rluc C,T 0 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p10_DMSO C,T 0.186 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p10_K22 C,T 0 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p10_K7523 C,T 0.032 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p16_DMSO C,T 0.954 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p16_K22 C,T 0.009 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21027 p16_X7523 C,T 0.158 missense_variant 533C>T Thr178Ile 178 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 hCoV229E_Rluc T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p10_DMSO T,C 0.08 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p10_K22 T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p10_K7523 T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p16_DMSO T,C 0.015 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p16_K22 T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 21633 p16_X7523 T,C 0 missense_variant 1139T>C Val380Ala 380 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 hCoV229E_Rluc T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p10_DMSO T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p10_K22 T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p10_K7523 T,G 0.078 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p16_DMSO T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p16_K22 T,G 0 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 22215 p16_X7523 T,G 0.033 missense_variant 1721T>G Val574Gly 574 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 hCoV229E_Rluc C,T 0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_DMSO C,T 0.032 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_K22 C,T 1 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p10_K7523 C,T 0.835 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p16_DMSO C,T 0 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p16_K22 C,T 0.958 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 23435 p16_X7523 C,T 0.132 missense_variant 2941C>T Leu981Phe 981 1173 Gene_20494_24015 XBA84230.1 PP810610 24738 hCoV229E_Rluc C,A,T 0 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 24738 p10_DMSO C,A,T 0.011 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 24738 p10_K22 C,A,T 1 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 24738 p10_K7523 C,A,T 0.106 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 24738 p16_DMSO C,A,T 0 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 24738 p16_K22 C,A,T 1 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 24738 p16_X7523 C,A,T 0.958 missense_variant 64C>A,64C>T Leu22Ile,Leu22Phe 22 77 Gene_24674_24907 XBA84233.1 PP810610 25025 hCoV229E_Rluc C,T 0 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25025 p10_DMSO C,T 0.049 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25025 p10_K22 C,T 0 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25025 p10_K7523 C,T 0 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25025 p16_DMSO C,T 0.057 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25025 p16_K22 C,T 0 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25025 p16_X7523 C,T 0.016 missense_variant 106C>T His36Tyr 36 225 Gene_24919_25596 XBA84234.1 PP810610 25592 hCoV229E_Rluc T,C 0.012 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p10_DMSO T,C 0.925 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p10_K22 T,C 0 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p10_K7523 T,C 0 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p16_DMSO T,C 0.935 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p16_K22 T,C 0 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 PP810610 25592 p16_X7523 T,C 0.013 missense_variant 673T>C Phe225Leu 225 225 Gene_24919_25596 XBA84234.1 the mail I generate answer: Lieber Jiabin, wir hatten im Sommer 2024 ein Projekt gemeinsam mit Thomas Pietschmann zu 229E Coronavirus, wo wir sequenziert hatten und die Adaption des Virus durch Variantenanalyse bestimmt hatten. Ich habe Dir die alten Auswertungen angehängt. Die Viren sind nun ohne Selektionsdruck weiterpassagiert worden und die Frage ist ob die damals aufgefundenen Mutationen erhalten bleiben oder verloren gehen. Könntest Du diese drei Proben (siehe link von Patrick https://public.leibniz-liv.de/sharing/tuBWPq3ca ) im Vergleich zu dem adaptierten Virus von 2024 analysieren? Viele Grüße Nicole 7. Merge intra- and inter-host variants, comparing the variants to the alignments of the assemblies to confirm its correctness. cat NC_001348.fasta viralngs/data/02_assembly/VZV_20S.fasta viralngs/data/02_assembly/VZV_60S.fasta > aligned_1.fasta mafft –clustalout aligned_1.fasta > aligned_1.aln #~/Scripts/convert_fasta_to_clustal.py aligned_1.fasta_orig aligned_1.aln ~/Scripts/convert_clustal_to_clustal.py aligned_1.aln aligned_1_.aln #manully delete the postion with all or ‘-‘ in aligned_1_.aln ~/Scripts/check_sequence_differences.py aligned_1_.aln ~/Scripts/check_sequence_differences.py aligned_1_.aln > aligned_1.res grep -v ” = n” aligned_1.res > aligned_1_.res cat NC_001348.fasta viralngs/tmp/02_assembly/VZV_20S.assembly4-refined.fasta viralngs/tmp/02_assembly/VZV_60S.assembly4-refined.fasta > aligned_1.fasta mafft –clustalout aligned_1.fasta > aligned_1.aln ~/Scripts/convert_clustal_to_clustal.py aligned_1.aln aligned_1_.aln ~/Scripts/check_sequence_differences.py aligned_1_.aln > aligned_1.res grep -v ” = n” aligned_1.res > aligned_1_.res #Differences found at the following positions (150): Position 8956: OP297860.1 = A, HSV1_S1-1 = A, HSV-Klinik_S2-1 = G Position 8991: OP297860.1 = A, HSV1_S1-1 = A, HSV-Klinik_S2-1 = C Position 8992: OP297860.1 = T, HSV1_S1-1 = C, HSV-Klinik_S2-1 = C Position 8995: OP297860.1 = T, HSV1_S1-1 = T, HSV-Klinik_S2-1 = C Position 9190: OP297860.1 = T, HSV1_S1-1 = A, HSV-Klinik_S2-1 = T * Position 13659: OP297860.1 = G, HSV1_S1-1 = T, HSV-Klinik_S2-1 = G * Position 47969: OP297860.1 = C, HSV1_S1-1 = T, HSV-Klinik_S2-1 = C * Position 53691: OP297860.1 = G, HSV1_S1-1 = T, HSV-Klinik_S2-1 = G * Position 55501: OP297860.1 = T, HSV1_S1-1 = C, HSV-Klinik_S2-1 = C * Position 63248: OP297860.1 = G, HSV1_S1-1 = T, HSV-Klinik_S2-1 = G Position 63799: OP297860.1 = T, HSV1_S1-1 = C, HSV-Klinik_S2-1 = T * Position 64328: OP297860.1 = C, HSV1_S1-1 = A, HSV-Klinik_S2-1 = C Position 65179: OP297860.1 = T, HSV1_S1-1 = T, HSV-Klinik_S2-1 = C * Position 65225: OP297860.1 = G, HSV1_S1-1 = G, HSV-Klinik_S2-1 = A * Position 95302: OP297860.1 = C, HSV1_S1-1 = A, HSV-Klinik_S2-1 = C gunzip isnvs.annot.txt.gz ~/Scripts/filter_isnv.py isnvs.annot.txt 0.05 cut -d$’\t’ filtered_isnvs.annot.txt -f1-7 chr pos sample patient time alleles iSNV_freq OP297860 13203 HSV1_S1 HSV1_S1 T,C,A 1.0 OP297860 13203 HSV-Klinik_S2 HSV-Klinik_S2 T,C,A 1.0 OP297860 13522 HSV1_S1 HSV1_S1 G,T 1.0 OP297860 13522 HSV-Klinik_S2 HSV-Klinik_S2 G,T 0.008905554253573941 OP297860 13659 HSV1_S1 HSV1_S1 G,T 1.0 OP297860 13659 HSV-Klinik_S2 HSV-Klinik_S2 G,T 0.008383233532934131 ~/Scripts/convert_clustal_to_fasta.py aligned_1_.aln aligned_1.fasta samtools faidx aligned_1.fasta samtools faidx aligned_1.fasta OP297860.1 > OP297860.1.fasta samtools faidx aligned_1.fasta HSV1_S1-1 > HSV1_S1-1.fasta samtools faidx aligned_1.fasta HSV-Klinik_S2-1 > HSV-Klinik_S2-1.fasta seqkit seq OP297860.1.fasta -w 70 > OP297860.1_w70.fasta diff OP297860.1_w70.fasta ../../refsel_db/refsel.fasta 8. Consensus sequences of each and of all isolates cp data/02_assembly/*.fasta ./ for sample in 838_S1 840_S2 820_S3 828_S4 815_S5 834_S6 808_S7 811_S8 837_S9 768_S10 773_S11 767_S12 810_S13 814_S14 10121-16_S15 7510-15_S16 828-17_S17 8806-15_S18 9881-16_S19 8981-14_S20; do for sample in p953-84660-tsek p938-16972-nra p942-88507-nra p943-98523-nra p944-103323-nra p947-105565-nra p948-112830-nra; do \ mv ${sample}.fasta ${sample}.fa cat all.fa ${sample}.fa >> all.fa done cat RSV_dedup.fa all.fa > RSV_all.fa mafft –adjustdirection RSV_all.fa > RSV_all.aln snp-sites RSV_all.aln -o RSV_all_.aln 9. Download all Human alphaherpesvirus 3 (Varicella-zoster virus) genomes Human alphaherpesvirus 3 acronym: HHV-3 VZV equivalent: Human herpes virus 3 Human alphaherpesvirus 3 (Varicella-zoster virus) * Human herpesvirus 3 strain Dumas * Human herpesvirus 3 strain Oka vaccine * Human herpesvirus 3 VZV-32 #Taxonomy ID: 10335 esearch -db nucleotide -query “txid10335[Organism:exp]” | efetch -format fasta -email j.huang@uke.de > genome_10335_ncbi.fasta python ~/Scripts/filter_fasta.py genome_10335_ncbi.fasta complete_genome_10335_ncbi.fasta #2041–>165 # —- Download related genomes from ENA —- https://www.ebi.ac.uk/ena/browser/view/10335 #Click “Sequence” and download “Counts” (2003) and “Taxon descendants count” (2005) if there is enough time! Downloading time points is 11.03.2025. python ~/Scripts/filter_fasta.py ena_10335_sequence.fasta complete_genome_10335_ena_taxon_descendants_count.fasta #2005–>153 #python ~/Scripts/filter_fasta.py ena_10335_sequence_Counts.fasta complete_genome_10335_ena_Counts.fasta #xxx, 5.8G https://www.ebi.ac.uk/ena/browser/view/10239 https://www.ebi.ac.uk/ena/browser/view/2497569 https://www.ebi.ac.uk/ena/browser/view/Taxon:2497569 ena_10239_sequence.fasta esearch -db nucleotide -query “txid10239[Organism:exp]” | efetch -format fasta -email j.huang@uke.de > genome_10239_ncbi.fasta 10. Using Multi-CAR for scaffolding the contigs (If not useful, choose another scaffolding tool, e.g. https://github.com/malonge/RagTag) All contigs over 500 bp were successfully scaffolded to the graft genome using Multi-CAR (13), resulting in a chromosomal assembly of 4,506,689 bp. https://genome.cs.nthu.edu.tw/Multi-CAR/ https://github.com/ablab-nthu/Multi-CSAR 11. Using the bowtie of vrap to map the reads on ref_genome/reference.fasta (The reference refers to the closest related genome found from the list generated by vrap) (vrap) vrap/vrap.py -1 trimmed/VZV_20S_trimmed_P_1.fastq -2 trimmed/VZV_20S_trimmed_P_2.fastq -o VZV_20S_on_X04370 –host /home/jhuang/DATA/Data_Huang_Human_herpesvirus_3/X04370.fasta -t 100 -l 200 -g cd bowtie mv mapped mapped.sam samtools view -S -b mapped.sam > mapped.bam samtools sort mapped.bam -o mapped_sorted.bam samtools index mapped_sorted.bam samtools view -H mapped_sorted.bam samtools flagstat mapped_sorted.bam 12. Show the bw on IGV 13. Reports diff data/02_assembly/2040_04.fasta tmp/02_assembly/2040_04.assembly4-refined.fasta diff data/02_assembly/2040_04.fasta tmp/02_assembly/2040_04.assembly1-spades.fasta diff data/02_assembly/2040_04.fasta tmp/02_assembly/2040_04.assembly2-scaffolded.fasta diff data/02_assembly/2040_04.fasta tmp/02_assembly/2040_04.assembly2-gapfilled.fasta diff data/02_assembly/2040_04.fasta tmp/02_assembly/2040_04.assembly3-modify.fasta diff data/02_assembly/2040_04.fasta tmp/02_assembly/2040_04.assembly4-refined.fasta ./2040_04.assembly2-alternate_sequences.fasta ./2040_04.assembly2-scaffold_ref.fasta # —————————————– END —————————————– Lieber Jiabin, wir hatten im Sommer 2024 ein Projekt gemeinsam mit Thomas Pietschmann zu 229E Coronavirus, wo wir sequenziert hatten und die Adaption des Virus durch Variantenanalyse bestimmt hatten. Ich habe Dir die alten Auswertungen angehängt. Die Viren sind nun ohne Selektionsdruck weiterpassagiert worden und die Frage ist ob die damals aufgefundenen Mutationen erhalten bleiben oder verloren gehen. Könntest Du diese drei Proben (siehe link von Patrick https://public.leibniz-liv.de/sharing/tuBWPq3ca) im Vergleich zu dem adaptierten Virus von 2024 analysieren? Viele Grüße Nicole 亲爱的Jiabin, 我们在2024年夏天与Thomas Pietschmann一起进行了一项关于229E冠状病毒的项目,我们进行了测序,并通过变异分析确定了病毒的适应性。我已经将之前的分析结果附在了邮件中。 这些病毒现在已经在没有选择压力的情况下继续通过传代,我们的问题是,之前发现的突变是否会保持下来还是会消失。 你能否分析一下这三份样本(请参阅Patrick提供的链接:https://public.leibniz-liv.de/sharing/tuBWPq3ca),并与2024年适应的病毒进行比较? 此致 Nicole Von: Blümke, Patrick Gesendet: Freitag, 9. Mai 2025 16:26 An: ‘nfischer@uke.de’ Betreff: [EXT] RE: Re: Re: [EXTERN] NGS von adaptiertem hCoV-229E Virus Liebe Nicole, wir haben die 3 RNA-Proben von Thomas Pietschmann mittlerweile sequenziert und du kannst die Daten hier herunterladen: https://public.leibniz-liv.de/sharing/tuBWPq3ca LG und ein schönes Wochenende, Patrick From: nfischer@uke.de Sent: Mittwoch, 23. April 2025 12:42 To: Blümke, Patrick Subject: WG: [EXT] Re: Re: [EXTERN] NGS von adaptiertem hCoV-229E Virus Lieber Patrick, Thomas Pietschmann/Twin Core Hannover hat drei weitere Proben geschickt, RNA aus dem Überstand von passagiertem CoV229E. Er würde sie gerne bei Euch nochmal sequenzieren und Jiabin würde die Auswertung entsprechend der vorherigen Analyse machen. Ich weiß nicht, wie tief Ihr damals sequenziert habt. Ich denke 5Mio reads? Kann ich Euch die RNA bringen und habt Ihr die Kapazität diese Libraries zu machen und zu sequenzieren? LG Nicole Von: KC6hler, Natalie Gesendet: Freitag, 4. April 2025 11:21 An: Nicole Fischer ; Pietschmann, Thomas ; Sibylle Haid ; Nicole Fischer ; Pietschmann, Thomas ; Sibylle Haid Betreff: [EXT] NGS von adaptiertem hCoV-229E Virus Liebe Nicole, vielen Dank nochmal für die Sequenzierung unseres adaptierten hCoV-229E Virus. Die Ergebnisse haben uns sehr weitergeholfen. Wir haben nun unsere adaptierten hCoV-229E Viruspopulationen ohne Selektionsdruck weiter passagiert, um zu schauen, ob unsere Mutationen und die Resistenz wieder verloren geht oder erhalten bleibt. Dazu haben wir auch bereits einen ersten Phänotyp zeigen können, den wir nochmal validieren wollen. Thomas und ich haben uns daher gefragt, ob Ihr nach der Validierung unsere neuen Virusstocks nochmal sequenzieren könntet (es wären drei Proben)? Vielen Dank schon mal für die Hilfe und ich wünsche ein schönes sonniges Wochenende! Liebe Grüße aus Hannover Natalie ——– Natalie Köhler, M. Sc. PhD Student AG Prof. Thomas Pietschmann TWINCORE – Centre for Experimental and Clinical Infection Research Institute for Experimental Virology Feodor-Lynen-Str. 7 | D-30625 Hannover Tel.: +49 (0)511 22002-7138 E-Mail: natalie.koehler@twincore.de Von: “Pietschmann, Thomas” Datum: Mittwoch, 26. Juni 2024 um 10:36 An: Nicole Fischer , “Haid, Sibylle (Twincore)” , Köhler, Natalie Betreff: Re: [EXTERN] AW: [EXT] AW: NGS von adaptiertem CoV-229E Virus Liebe Nicole, liebe alle, Dir und deinen Teammitgliedern, die beteiligt waren vielen Dank für die Analyse der Viren. Das sind sehr interessante Ergebnisse. Wir finden nicht die vorbeschriebenen Mutationen, die nach Selektion mit K22 erzeugt wurden (siehe Lundin et al in der Anlage V..) Wir finden zwei identische Mutationen, die sowohl bei K22 als auch bei K7523 selektiert wurden (Leu3552Phe, Leu981Phe) Wir finden eine zusätzliche Mutation für K7523 (Ala2691Asp) Keine der Mutationen ist direkt in Nsp6 (Leu2552Phe müsste der N-terminus von Nsp7 sein); habe auf die schnelle kein annotiertes 229E Genom mit unserer Sequenz gefunden.. stattdessen AGT21366.1 verwendet, dort ist allerdings 3552 ein Serin (https://www.ncbi.nlm.nih.gov/protein/AGT21366.1) Die Mutation die nur durch K7523 (nicht durch K22) selektiert wurde sitzt in nsp4 (Ala 2691Asp; https://www.ncbi.nlm.nih.gov/protein/AGT21366.1) Die gemeinsame zweite gemeinsame Mutation Leu981Phe sitzt in nsp3, Papain-like protease PLpro (https://www.ncbi.nlm.nih.gov/protein/AGT21366.1) PLpro spaltet nur nsp1-4, die Spaltung zw. nsp6-7 wird durch CLpro gemacht. Natalie, kannst Du all das bitte nachprüfen? Am besten mit einem annotierten hCoV229E Genom, das unserer Sequenz entspricht. Wir sollte nun sofort in Kooperation die Varianten klonieren und testen, welche Varienten-(Kombinationen) Resistenz vermitteln. Spannend wird auch die Selektion auf SARS-CoV-2. Es könnte auch interessant sein, die Selektion nochmal auf hCoV229E zu wiederholen (wieder selbige Mutation)… vielleicht auch eine Selektion in einer anderen Zelllinie… gibt es weitere Lösungen oder immer selbige? Viele Grüße Thomas PS: Findest du weitere Papiere, die K22 Resistenzen beschreiben? Wenn ja, wo liegen die? Von: “nfischer@uke.de” Datum: Dienstag, 25. Juni 2024 um 18:58 An: Sibylle Haid Cc: “‘KC6hler,Natalie'” , Thomas Pietschmann Betreff: [EXTERN] AW: [EXT] AW: NGS von adaptiertem CoV-229E Virus Liebe Sibylle, Liebe Natalie, lieber Thomas, anbei die Ergebnisse des variant callings der Proben. Es gab etwas Probleme mit der Referenzsequenz assembly, weshalb wir auf eine andere accession number ausweichen mussten. Jiabin hat PP810610 als Referenz genommen aufgrund des unvollständigen de novo assemblies der mitgeschickten Referenz. PP810610 ist nur wenig abweichend von den Bereichen des Reportervirus, die assembled werden konnten. Gerne beantworten wir Fragen zu den Analysen. Jiabin schaut sich nochmal eine Position genauer an, Ihr werdet es sehen, an Position xxx zeigt die DMSO Kontrolle eine Abweichung. Wir schauen uns gerade die IGV files dazu an, warum das so ist. Das könnt Ihr für jetzt erstmal ignorieren. Herzliche Grüße Nicole # ——————————————————————- # —- !! DEPRECATED mamba env configuration due to too many conflicts, use docker instead (see above) !! —- #mamba activate /home/jhuang/miniconda3/envs/viral-ngs4 mkdir viralngs ln -s ~/Tools/viral-ngs/Snakefile Snakefile ln -s ~/Tools/viral-ngs/bin bin cp ~/Tools/viral-ngs/refsel.acids refsel.acids cp ~/Tools/viral-ngs/lastal.acids lastal.acids cp ~/Tools/viral-ngs/config.yaml config.yaml cp ~/Tools/viral-ngs/samples-runs.txt samples-runs.txt cp ~/Tools/viral-ngs/samples-depletion.txt samples-depletion.txt cp ~/Tools/viral-ngs/samples-metagenomics.txt samples-metagenomics.txt cp ~/Tools/viral-ngs/samples-assembly.txt samples-assembly.txt cp ~/Tools/viral-ngs/samples-assembly-failures.txt samples-assembly-failures.txt # — DEBUG: If the env disappeared, reinstall the env viral-ngs4 — # — Running time hints — #Note that novoalign is not installed. The used Novoalign path: /home/jhuang/Tools/novocraft_v3/novoalign; the used gatk: /usr/local/bin/gatk using /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar. #Samtools path: #Why, the samtools in the env is v1.6? #Novoalign path: /home/jhuang/Tools/novocraft_v3/novoalign #GATK path: /usr/local/bin/gatk # jar_file in the file: jar_file = ‘/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar’ # — in config.yaml — #GATK_PATH: “/home/jhuang/Tools/GenomeAnalysisTK-3.6” #NOVOALIGN_PATH: “/home/jhuang/Tools/novocraft_v3” mamba list or mamba list blast mamba create -n viral-ngs4 python=3.6 mamba activate viral-ngs4 mamba install blast=2.6.0 bmtagger biopython pysam pyyaml picard mvicuna pybedtools fastqc matplotlib spades last=876 -c conda-forge -c bioconda #mafft=7.221 –> mafft since └─ mafft 7.221** is not installable because it conflicts with any installable versions previously reported. mamba install cd-hit cd-hit-auxtools diamond gap2seq=2.1 mafft mummer4 muscle=3.8 parallel pigz prinseq samtools=1.6 tbl2asn trimmomatic trinity unzip vphaser2 bedtools -c r -c defaults -c conda-forge -c bioconda mamba install bwa mamba install vphaser2=2.0 # Sovle confilict between bowtie, bowtie2 and snpeff mamba remove bowtie mamba install bowtie2 mamba remove snpeff mamba install snpeff=4.1l #which snpEff mamba install gatk=3.6 #DEBUG if FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/bin/gatk’: ‘/usr/local/bin/gatk’ #IMPORTANT_UPDATE jar_file in the file /home/jhuang/mambaforge/envs/viral-ngs4/bin/gatk3 with “/home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar” #IMPORTANT_REPLACE “sudo cp /home/jhuang/mambaforge/envs/viral-ngs4/bin/gatk3 /usr/local/bin/gatk” #IMPORTANT_SET /home/jhuang/Tools/GenomeAnalysisTK-3.6 as GATK_PATH in config.yaml #IMPORTANT_CHECK if it works # java -jar /home/jhuang/Tools/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator –help # /usr/local/bin/gatk -T RealignerTargetCreator –help #IMPORTANT_NOTE that the env viral-ngs4 cannot logined from the base env due to the python3-conflict! # —- BUG_2025_1 —- bin/taxon_filter.py deplete data/00_raw/1762_04.bam tmp/01_cleaned/1762_04.raw.bam tmp/01_cleaned/1762_04.bmtagger_depleted.bam tmp/01_cleaned/1762_04.rmdup.bam data/01_cleaned/1762_04.cleaned.bam –bmtaggerDbs /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/hg19 /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3 –blastDbs /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters –threads 15 –srprismMemory 14250 –JVMmemory 50g –loglevel DEBUG 2025-05-22 12:10:18,313 – __init__:445:_attempt_install – DEBUG – Currently installed version of blast: 2.16.0-hc155240_3 2025-05-22 12:10:18,314 – __init__:448:_attempt_install – DEBUG – Expected version of blast: 2.6.0 2025-05-22 12:10:18,314 – __init__:449:_attempt_install – DEBUG – Incorrect version of blast installed. Removing it… 2025-05-23 09:58:43,151 – __init__:445:_attempt_install – DEBUG – Currently installed version of bmtagger: 3.101-h470a237_4 2025-05-23 09:58:45,326 – __init__:445:_attempt_install – DEBUG – Currently installed version of blast: 2.7.1-h4422958_6 2025-05-23 09:58:45,327 – __init__:448:_attempt_install – DEBUG – Expected version of blast: 2.6.0 2025-05-23 09:58:45,327 – __init__:449:_attempt_install – DEBUG – Incorrect version of blast installed. Removing it… # —- # # Some not errorous intermediate commands # fastqc -f bam data/02_align_to_self/p10_K7523.bam -o reports/fastqc/p10_K7523 # bin/intrahost.py vphaser_one_sample data/02_align_to_self/p10_K7523.mapped.bam data/02_assembly/p10_K7523.fasta data/04_intrahost/vphaser2.p10_K7523.txt.gz –vphaserNumThreads 15 –removeDoublyMappedReads –minReadsEach 5 –maxBias 10 # bin/read_utils.py align_and_fix data/01_per_sample/p16_DMSO.cleaned.bam data/02_assembly/p16_DMSO.fasta –outBamAll data/02_align_to_self/p16_DMSO.bam –outBamFiltered data/02_align_to_self/p16_DMSO.mapped.bam –aligner novoalign –aligner_options ‘-r Random -l 20 -g 40 -x 20 -t 100 -k’ –threads 15 # bin/intrahost.py merge_to_vcf ref_genome/reference.fasta data/04_intrahost/isnvs.vcf.gz –samples 2039_04 2040_04 1762_04 1243_2 875_04 –isnvs data/04_intrahost/vphaser2.2039_04.txt.gz data/04_intrahost/vphaser2.2040_04.txt.gz data/04_intrahost/vphaser2.1762_04.txt.gz data/04_intrahost/vphaser2.1243_2.txt.gz data/04_intrahost/vphaser2.875_04.txt.gz –alignments data/03_multialign_to_ref/aligned_1.fasta –strip_chr_version –parse_accession –loglevel DEBUG

Habilitationsschrift

汉堡大学医学院博士后资格认定(教授资格)程序实施指南

(2004年10月13日版本,最近修订于2017年3月) 根据2017年3月22日学院理事会决议,自2018年4月1日起生效

1. 序言

根据2015年2月18日颁布的《汉堡大学医学院博士后资格认定条例》(以下简称《博士后条例》)第2条规定,博士后资格认定旨在证明申请人在医学院某一研究领域具备独立开展科研工作的特殊能力,且该研究已为相关学科带来具有重大意义的知识突破¹。本《博士后资格认定指南》所列标准旨在为《博士后条例》第7条规定的博士后资格认定委员会提供决策参考,以判断申请人的博士后论文及其他学术成果是否足以证明其已对所申请博士后资格认定的学科领域进行了科学深入的钻研,从而满足《博士后条例》第2条的要求。

本指南旨在促进学院内部的学术竞争,持续提升汉堡大学医学院的科研质量与产出水平。

此外,本指南还旨在提高程序执行的透明度与客观性,并为博士后申请人提供自我评估的依据,从而便于其决定是否提交博士后资格认定申请。需要强调的是,本指南仅规定最低要求;针对申请博士后资格认定的具体研究领域,可设定更高的标准。因此,满足本指南所列标准并不能保证博士后资格认定申请必然成功。

¹ 重大知识突破的示例包括:

  • 通过可靠(即可重复)的新数据推翻原有科学假说,并提出新假说;
  • 开发并验证具有开创性的新实验方法;
  • 对传统假说进行修正,同时开辟新的研究领域;
  • 激发对科学问题的新理解,从而推动新的科研工作。

2. 学术成果评估

需注意,博士后资格认定程序的评审对象仅限于博士后论文及其所基于的出版物;其他发表成果的证明仅用于佐证申请人的学术水准及博士后资格认定适格性。即使是非常高水平的发表成果,本身也不构成独立的博士后资格认定业绩。

A) 著作目录

a) 《博士后条例》第3条第2款第5项规定的著作目录应按以下结构编排:

  1. 原创研究论文
  2. 综述文章
  3. 病例报告
  4. 已发表的会议贡献(非摘要)
  5. 书籍章节
  6. 报告与海报(摘要)
  7. 继续教育讲座

b) 申请人应至少在专业领域内高影响力期刊上发表2篇原创研究论文,其中1篇须以第一作者或通讯作者(最后作者)身份发表。共同第一作者视同单独第一作者,共同通讯作者视同单独通讯作者。若某期刊在其所属专业领域的《期刊引证报告—科学引文索引》(JCR-SCI)或《期刊引证报告—社会科学引文索引》(JCR-SSCI)期刊列表中,按影响因子排名位于前33%,则视为该专业领域内的高影响力期刊。

c) 此外,申请人应至少在专业领域内优秀期刊上以第一作者或通讯作者身份发表4篇原创研究论文。共同第一作者视同单独第一作者,共同通讯作者视同单独通讯作者。若某期刊在其所属专业领域的JCR-SCI/JCR-SSCI期刊列表中,按影响因子排名位于前60%,则视为该专业领域内的优秀期刊。

d) 在b)和c)项要求的第一作者或通讯作者论文中,至少60%须为单独第一作者或单独通讯作者发表。若单独第一作者或通讯作者论文数量不足,可用两篇共同第一作者或共同通讯作者的原创论文替代一篇单独作者论文。共同第一作者或通讯作者人数不得超过两人。

e) 除此之外,申请人还应至少在国际公认期刊上发表5篇其他原创研究论文,且申请人须对论文作出决定性贡献²。通常,若某期刊被收录于JCR专业期刊列表中,即视为国际公认期刊。

f) 博士后申请的形式审查由学院办公室负责;审查结果将提交博士后资格认定委员会供进一步审议参考。

g) 博士后资格认定委员会负责评估博士后论文的学术内容。对已发表出版物及未发表的特殊学术成果的评价可作为辅助参考。

h) 博士后论文应能清晰归入申请人的研究领域。

i) 申请人应提交至少5篇以摘要形式证明的会议报告或海报,且申请人须以第一作者或通讯作者身份在学术会议上进行展示。

j) 专利可等同于一篇原创研究论文进行评价。

² 申请人须清晰说明其具体贡献内容。

k) 在特殊情况下,结合学科特点,某些专业领域的特殊成果可被认可为著作目录的有效组成部分。


B) 博士后论文

博士后论文篇幅通常不应超过120页(含图表、图注、表格、摘要,但不含参考文献)。论文结构应如下:目录(含页码)、引言、材料与方法、结果、讨论、总结、参考文献。

累积型博士后资格认定应基于至少3篇围绕同一主题领域、以第一作者或通讯作者(含共同作者)身份发表的原创论文,且这些论文通常应在获得博士学位后完成。申请人须提交一份约20页的书面综合论述,对上述原创论文进行系统总结。该总结应表明申请人有能力将各篇论文的研究成果置于更广阔的学术背景中加以整合,并阐明其对所申请学科领域带来的重大知识突破。


C) 评审意见

根据《博士后条例》第8条第1款,博士后资格认定委员会需借助外部专家意见,通过委托校外专家出具评审报告,以评估申请人的博士后资格认定适格性。在此过程中须注意³:

  • 评审意见的目的是对博士后资格认定业绩进行专业评估,确保评审结果能准确对照考核目标。因此,受邀评审专家必须具备必要的专业资质,并需审慎遴选;
  • 评审专家应基于完整、准确的事实进行判断,即向其提供评估博士后资格认定申请所需的全部信息。但需注意,评审专家应专注于博士后论文本身的评估,因为只有论文本身才能体现博士后资格认定业绩的学术价值。博士后论文的不足不能通过申请人(共同)撰写的其他出版物予以弥补;具备博士后资格认定价值的论文必须自身逻辑清晰、内容完整,构成一项独立自洽的学术成果;
  • 评审专家在评估博士后资格认定业绩时,仅应依据《汉堡高等教育法》第71条及《博士后条例》第2条规定的要求。此举旨在确保评审标准的统一性。相关要求已在本指南序言部分予以说明。

³ 依据汉堡行政法院2004年4月21日判决(案号:15 K 3849/03)的阐述。


医学院
院长办公室
博士后资格认定指南
附件02,编号7.1.6,版本日期:2017年4月24日

Abbreviation: GENES-BASEL ISSN: N/A eISSN: 2073-4425 Category: GENETICS & HEREDITY – SCIE WoS Core Citation Indexes: Scientific literature search SCIE – Science Citation Index Expanded Journal Impact Factor (JIF): 2.8 5-year Impact Factor: 3.2 Best ranking: GENETICS & HEREDITY ║ Percentage rank: 58.6% Open Access Support: Fully Open Access ― It may take a publication fee. For more info, check it on DOAJ.ORG Country: SWITZERLAND Status in WoS core: Active Publisher: MDPI AG


https://wos-journal.info/journalid/13138#google_vignette

Abbreviation: BIOLOGY-BASEL ISSN: N/A eISSN: 2079-7737 Category: BIOLOGY – SCIE WoS Core Citation Indexes: SCIE – Science Citation Index Expanded Journal Impact Factor (JIF): 3.5 5-year Impact Factor: 4 Best ranking: BIOLOGY ║ Percentage rank: 81.3% Open Access Support: Fully Open Access ― It may take a publication fee. For more info, check it on DOAJ.ORG Country: SWITZERLAND Status in WoS core: Active Publisher: MDPI AG


Abbreviation: MSYSTEMS ISSN: 2379-5077 eISSN: 2379-5077 Category: MICROBIOLOGY – SCIE WoS Core Citation Indexes: Scientific literature search SCIE – Science Citation Index Expanded Journal Impact Factor (JIF): 4.6 5-year Impact Factor: 5.7 Best ranking: MICROBIOLOGY (Q1) ║ Percentage rank: 79.1% Open Access Support: Fully Open Access ― It may take a publication fee. For more info, check it on DOAJ.ORG Country: UNITED STATES Status in WoS core: Active Publisher: American Society for Microbiology


Abbreviation: MICROBIOL RESOUR ANN ISSN: N/A eISSN: 2576-098X Category: MICROBIOLOGY – ESCI WoS Core Citation Indexes: ESCI – Emerging Sources Citation Index Journal Impact Factor (JIF): 0.6 5-year Impact Factor: 0.6 Best ranking: MICROBIOLOGY (Q4) ║ Percentage rank: 4.3% Open Access Support: Fully Open Access ― It may take a publication fee. For more info, check it on DOAJ.ORG Country: UNITED STATES Status in WoS core: Active Publisher: American Society for Microbiology



是的,文件里明确有对 Vortrag / Poster 的最低要求:

你至少需要 5 个有 abstract 证明的会议报告或海报,并且你必须是该 abstract 的第一作者或最后作者。

原文是:

“Es sollen mindestens 5 durch Abstracts belegte Vorträge oder Poster vorgelegt werden, die die Antragstellerin als Erst- oder Letztautorin/der Antragsteller als Erst- oder Letztautor auf einem Kongress vertreten hat.”

中文逐句翻译:

  • Es sollen mindestens 5 durch Abstracts belegte Vorträge oder Poster vorgelegt werden 应至少提交 5 个有摘要(abstract)作为证明的学术报告或海报。

  • die die Antragstellerin als Erst- oder Letztautorin / der Antragsteller als Erst- oder Letztautor auf einem Kongress vertreten hat. 并且申请人必须是在会议上以第一作者或最后作者身份展示这些报告或海报的人。

注意:

  • 可以是 oral presentation,也可以是 poster;
  • 只要会议 abstract 被正式接受、可提供会议册或 abstract book 即可;
  • shared first / shared last 通常会被接受,但文件没有像论文那样明确写出,因此最好保存好会议 abstract 和作者顺序。

b) “Es sollen mindestens 2 Originalarbeiten in fachspezifisch hochrangigen Journalen vorliegen, von denen eine als Erst- oder Letztautorin/Erst- oder Letztautor publiziert sein soll.” 应至少有 2 篇发表在本专业高水平期刊上的原创论文,其中至少 1 篇申请人应为第一作者或最后作者。

“Dabei wird eine gleichberechtigte Erstautorschaft wie eine alleinige Erstautorschaft gewertet, eine gleichberechtigte Letztautorenschaft wird wie eine alleinige Letztautorschaft gewertet.” 其中,共同第一作者视同于单独第一作者;共同最后作者视同于单独最后作者。

“Fachspezifisch hochrangig ist ein Journal dann, wenn es sich unter den ersten 33% (nach Impact-Faktor) der den jeweiligen Fachgebieten der/des Antragstellers zugeordneten fachspezifischen Journallisten des Journal Citation Reports – Science Citation Index (JCR-SCI) bzw. dem Journal Citation Reports – Social Science Citation Index (JCR-SSCI) befindet.” 所谓“本专业高水平期刊”,是指该期刊按照影响因子排名,位于申请人所属学科的 JCR-SCI 或 JCR-SSCI 期刊列表前 33% 之内。

c) “Des Weiteren sollen mindestens 4 Originalarbeiten in fachspezifisch sehr guten Journalen vorliegen, bei denen die Antragstellerin Erst- oder Letztautorin/der Antragsteller Erst- oder Letztautor ist.” 此外,还应至少有 4 篇发表在本专业“非常优秀”期刊上的原创论文,并且申请人是第一作者或最后作者。

“Dabei wird eine gleichberechtigte Erstautorschaft wie eine alleinige Erstautorschaft gewertet, eine gleichberechtigte Letztautorenschaft wird wie eine alleinige Letztautorschaft gewertet.” 其中,共同第一作者视同于单独第一作者;共同最后作者视同于单独最后作者。

“Fachspezifisch sehr gut ist ein Journal dann, wenn es sich unter den vorderen 60% (nach Impact-Faktor) der dem jeweiligen Fachgebiet der/des Antragstellers zugeordneten fachspezifischen Journallisten des JCR-SCI/JCR-SSCI befindet.” 所谓“本专业非常优秀期刊”,是指该期刊按影响因子排名,位于申请人所属学科 JCR-SCI/JCR-SSCI 期刊列表前 60% 之内。

d) “Von den unter b) und c) geforderten Publikationen in Erst- oder Letztautorenschaft müssen mindestens 60% in alleiniger Erst- oder Letztautorschaft erschienen sein.” 在上述 b) 和 c) 要求的第一作者/最后作者论文中,至少 60% 必须是单独第一作者或单独最后作者。

“Dabei können fehlende alleinige Erst- oder Letztautorenschaften durch jeweils zwei Originalpublikationen in geteilter Erst- oder Letztautorenschaft ersetzt werden.” 如果缺少单独第一作者或单独最后作者论文,则每缺 1 篇,可以用 2 篇共同第一作者或共同最后作者论文来替代。

“Die Erst- oder Letztautorenschaften dürfen sich hierbei höchstens zwei Autoren oder Autorinnen teilen.” 这里的共同第一作者或共同最后作者,最多只能由两位作者共同承担。

Elbinsel-Pokal 2026

https://www.hsjb.de/kalender/

  1. HSK Youth-Cup 2025: Schüler der Klassen 5-13 und Grundschüler mit DWZ oder ELO Die Bedenkzeit beträgt 20 Minuten pro Spieler und Partie.

https://hsk1830.de/termin/40-hsk-youth-cup (12. Oktober 2025)

https://hsk1830.de/termin/44-hsk-kids-cup (19. Juli 2025)

https://hsk1830.de/termin/45-hsk-kids-cup (11. Oktober 2025)

https://www.schachbund.de/turnierdatenbank-hamburg.html

https://www.schachbund.de/turnierdetails/1-hamburger-talente-cup-u12.html

Es werden 7 Runden nach Schweizer System gespielt. Es sind zwei getrennt voneinander durchgeführte Turniere:

  • A) Klassen 5 – 8 und Grundschüler mit DWZ >1000
  • B) Klassen 5 – 6 und Grundschüler mit DWZ <1001

Modus: 7 Runden Schweizer System, Start der 1. Runde ca. 10:00 Uhr Turnier für Teilnehmer wie folgt:

  • Schüler der Klassen 5-13 und Grundschüler mit DWZ oder ELO
  • Die Bedenkzeit beträgt 20 Minuten pro Spieler und Partie.

HSK Kids-Cup

findet mehrmals im Jahr an einem Wochenendtag statt. Er ist für Kinder ohne DWZ und geeignet für Einsteiger und gilt als zusätzliche Schulwertung.

Es werden 7 Runden nach Schweizer System gespielt. Es sind zwei getrennt voneinander durchgeführte Turniere:

A) offen bis Klasse 4, DWZ-gewertet B) KiGa bis Klasse 2

Die Bedenkzeit beträgt 20 Minuten pro Spieler und Partie. In den ersten 15 Minuten wird mit geschrieben. Damit die Trainer zwischen den Runden mit den Spielern die Partie analysieren können.

10,–€ Startgeld für HSK-Mitglieder. Gäste zahlen 15,–€ Startgeld.

Pokale gibt es für die ersten drei Plätze, für das beste Mädchen und den Besten jeder Klassenstufe. Medaillen und Urkunden bekommen alle anderen.

laden ein zum Schnellturnier im Schulschach um den Elbinsel-Pokal 2026

  • Für: Schüler:innen im schulpflichtigen Alter

  • Termin: Samstag, 25. April 2026 persönliche Anwesenheitsmeldung bis 11 Uhr, danach 1. Runde Preisverleihung gegen 16:30 Uhr

  • Spielmodus: 9 Runden Schweizer System 15 Minuten Bedenkzeit pro Spieler:in und Partie

  • Ort: Bildungszentrum Tor zur Welt, Krieterstraße 2d, 21109 Hamburg

  • Anreise: S3/S5 bis Wilhelmsburg und/oder 154er bis Thielenstraße (Ost) Parkmöglichkeiten für Autos vorhanden, wenn auch begrenzt

  • Preise: Es gibt Pokale für Platz eins bis drei und Medaillen für die beste Schüler:in einer Jahrgangsstufe. Es gibt weitere Pokale für die beste weiterführende Schule und die beste Grundschule. Zur Vergabe dieser Pokale werden die Punkte der vier besten Schüler:innen einer Schule addiert.

  • Startgeld: 7,‒ € bei Zahlungseingang bis 30. März, danach 10,‒ € bevorzugt per Überweisung, nötigenfalls bar vor Ort

  • Verpflegung: Es gibt ausreichend Snacks und Getränke zu günstigen Preisen.

  • Anmeldung: bis zum 23. April über die Webseite https://www.skw.one/elbinsel-pokal– 2026, danach vor Ort, wenn noch Startplätze frei sind. Ob noch welche frei sind, wird auf der Webseite veröffentlicht.

Die Anmeldung gilt nur nach Zahlung des Startgeldes als vollzogen!

Einkommensgrenzen für die Familienversicherung (GKV)

Die folgende Tabelle zeigt die jährlichen und monatlichen Einkommensgrenzen für die kostenlose Familienversicherung in der gesetzlichen Krankenversicherung (GKV) in Deutschland. Maßgeblich ist das regelmäßige monatliche Gesamteinkommen.

Jahr Monatliche Grenze (allgemein) Jährlich (allgemein) Minijob-Grenze (Monat) Jährlich (Minijob)
2023 485 € 5.820 € 520 € 6.240 €
2024 505 € 6.060 € 538 € 6.456 €
2025 535 € 6.420 € 556 € 6.672 €
2026 565 € 6.780 € 603 € 7.236 €

📌 Hinweise

  • Die Grenze gilt für Ehepartner und Kinder in der Familienversicherung
  • Entscheidend ist das regelmäßige Einkommen, nicht einzelne Ausnahmen
  • Bei Überschreiten der Grenze ist eine eigene Krankenversicherung erforderlich

High Conservation of Functional Motifs in AdeB and AdeJ Efflux Pump Proteins Across Acinetobacter baumannii Homologs (Data_Tam_DNAseq_2025_E.hormaechei_and_Non-antibiotic_transport_on_ATCC19606)

📧 Email to Co-author

Subject: Updated AdeB/AdeJ Motif Conservation Analysis – Improved Pipeline Results


Dear [Co-author’s Name],

I hope this email finds you well. I’m writing to share the updated conservation analysis for the AdeB and AdeJ candidate motifs, incorporating the improvements we discussed based on your valuable feedback.

Key Improvements to the Analysis Pipeline

Following your observation about potential misannotations, we implemented a more rigorous filtering strategy:

  1. Identity-based filtering: We removed sequences with <80% identity and <90% coverage against reference AdeB/AdeJ sequences from ATCC 19606, eliminating likely misannotated paralogs

  2. Gap filtering: We excluded all sequences containing gap characters (−, X, N, *) in the raw FASTA files to remove fragmented or low-quality sequences

  3. Improved conservation calculation: We refined the Shannon entropy calculation to:

    • Exclude gap characters when computing conservation scores
    • Map motifs to alignment coordinates properly (accounting for gaps)
    • Display motifs as continuous blocks rather than individual residues

Results Summary

The updated analysis shows excellent conservation across both proteins:

AdeJ (250 sequences, 1058 columns)

  • Mean conservation: 99.9%
  • All 4 motifs found and highly conserved:
    • GNGQAS (positions ~83-88)
    • DIKDY (positions ~153-157)
    • DNYQFDSK (positions ~273-280)
    • AIKIA (positions ~290-294)

AdeB (214 sequences, 1036 columns)

  • Mean conservation: 99.1%
  • All 4 motifs found and highly conserved:
    • TSGTAE (positions ~84-89)
    • DLSDY (positions ~153-157)
    • QAYNFAIL (positions ~273-280)
    • AIQLS (positions ~290-294)

Interpretation

The conservation profiles (attached) demonstrate that:

  • After removing likely misannotations, all eight candidate motifs are highly conserved across AdeB and AdeJ homologs
  • The conservation scores are consistently near 1.0 (fully conserved) across most alignment positions
  • The small dips in conservation at specific positions likely represent genuine sequence variation rather than alignment artifacts

These results strongly support the functional importance of these motifs in the efflux pump mechanism.

Next Steps

Could you please review the attached figures and let me know if:

  1. The conservation patterns align with your expectations?
  2. The motif positions match what you observe in your structural analyses?
  3. You have any suggestions for additional validation steps?

I’m happy to discuss these results in more detail or run additional analyses if needed.

Best regards,
[Your Name]

Attachments:

  • adej_conservation_profile.png
  • adeb_conservation_profile.png

📝 Manuscript Text

Materials and Methods

Sequence Retrieval and Quality Filtering

To assess the conservation of candidate motifs in AdeB and AdeJ efflux pump proteins, we retrieved all available protein sequences from Acinetobacter baumannii from the NCBI protein database using Biopython Entrez. Initial length filtering was applied during retrieval (AdeJ: 1000–1070 amino acids; AdeB: 1000–1050 amino acids) to enrich for full-length proteins.

To eliminate potential misannotations and ensure sequence quality, we implemented a multi-step filtering pipeline:

  1. Identity and coverage filtering: Sequences were aligned against reference AdeB/AdeJ sequences from strain ATCC 19606 using BLASTp. Sequences with <80% identity or <90% coverage were excluded to remove distant paralogs and misannotated entries.

  2. Gap character filtering: Sequences containing gap characters (−, X, N, *) or ambiguous amino acids in the raw FASTA files were removed to eliminate fragmented or low-quality sequences.

  3. Multiple sequence alignment: Filtered sequences were aligned independently for AdeB and AdeJ using MAFFT (v7.x) with the L-INS-i algorithm (–localpair –maxiterate 1000 –adjustdirection) to ensure accurate homologous position mapping.

  4. Outlier removal: Sequences contributing disproportionately to alignment entropy (|z-score| > 2.0) or with <80% non-gap columns were excluded to improve alignment quality.

Conservation Score Calculation

Position-wise conservation was quantified using Shannon entropy. For each alignment column i, the conservation score Cᵢ was calculated as:

Cᵢ = 1 − (Hᵢ / Hmax)

where Hᵢ = −Σ(pⱼ × log₂pⱼ) is the Shannon entropy of column i, pⱼ is the frequency of amino acid j in the column, and Hmax = log₂(n) is the maximum possible entropy for n observed amino acids. Gap characters were excluded from entropy calculations to avoid artifactual conservation estimates.

Conservation scores range from 0 (completely variable) to 1 (fully conserved). Mean conservation across the full alignment was calculated to assess overall sequence conservation.

Motif Mapping and Visualization

To map candidate motifs to alignment coordinates, we generated a gap-free consensus sequence by extracting the most frequent residue at each alignment position. Motifs were localized in the gap-free consensus and mapped back to alignment coordinates, accounting for gap positions. Conservation scores within motif regions were extracted to quantify motif-specific conservation.

Results

High Conservation of AdeB and AdeJ Candidate Motifs

After rigorous quality filtering, we retained 250 AdeJ sequences (1058 alignment columns) and 214 AdeB sequences (1036 alignment columns) for conservation analysis. The filtering process removed 7 sequences from AdeJ and 9 sequences from AdeB due to low identity/coverage or gap content, confirming that the initial dataset contained likely misannotations as hypothesized.

Overall Conservation Profiles

Both AdeJ and AdeJ exhibited exceptionally high conservation across their full lengths. The mean conservation score was 0.999 (99.9%) for AdeJ and 0.991 (99.1%) for AdeB, indicating strong evolutionary constraint on these efflux pump proteins. The conservation profiles showed predominantly flat profiles at or near 1.0, with only sporadic positions exhibiting reduced conservation (Figure X).

Candidate Motif Conservation

All eight candidate motifs were successfully identified in the consensus sequences and showed uniformly high conservation:

AdeJ motifs:

  • GNGQAS (positions 83–88): Mean conservation = 1.000
  • DIKDY (positions 153–157): Mean conservation = 1.000
  • DNYQFDSK (positions 273–280): Mean conservation = 0.998
  • AIKIA (positions 290–294): Mean conservation = 1.000

AdeB motifs:

  • TSGTAE (positions 84–89): Mean conservation = 0.998
  • DLSDY (positions 153–157): Mean conservation = 0.995
  • QAYNFAIL (positions 273–280): Mean conservation = 0.992
  • AIQLS (positions 290–294): Mean conservation = 1.000

The conservation patterns were consistent between AdeB and AdeJ, with corresponding motifs showing similar conservation levels, supporting their functional importance in the efflux pump mechanism.

Interpretation

The near-perfect conservation of all eight candidate motifs after removal of misannotated sequences confirms their critical role in AdeB/AdeJ function. The slightly lower (but still very high) conservation in the QAYNFAIL and DNYQFDSK motifs (8-residue motifs) compared to the shorter 5–6 residue motifs may reflect position-specific tolerance for conservative substitutions in longer sequence contexts.

The isolated positions showing reduced conservation in the overall profiles likely correspond to surface-exposed or loop regions not involved in core pump function, whereas the motif regions represent functionally critical residues under strong purifying selection.


Figure Legend

Figure X | Conservation profiles of AdeB and AdeJ efflux pump proteins. Position-wise conservation scores (0–1 scale) calculated using Shannon entropy across multiple sequence alignments of (A) AdeJ (250 sequences, 1058 columns) and (B) AdeB (214 sequences, 1036 columns). Blue line and shading indicate conservation scores; horizontal dashed lines denote thresholds for high (>0.8, green) and moderate (0.5–0.8, orange) conservation. Colored vertical blocks indicate the positions of candidate functional motifs, with labels showing motif sequences. All four motifs in both proteins show mean conservation >0.99, indicating strong evolutionary constraint. Mean conservation across the full alignment was 0.999 for AdeJ and 0.991 for AdeB.


This should provide a comprehensive explanation for both your co-author and the manuscript! Let me know if you need any adjustments.

2026全球AI工具终极指南:15款主流产品跨平台实测 + 免费/付费全解析

IMPORTANT_TODO_NEXT_MONTH: use the 4 top chinese AI websites for the specific tasks of my everyday-work, then write a review comparing the four AI tools.

  • | 通义千问 3.5-Max | 26.2万上下文,轻量化离线部署,企业级最强 | 国际版功能待完善 | https://chat.qwen.ai |
  • | DeepSeek-R1 | 600万美元训练成本,数学代码媲美o1,完全免费 | 品牌知名度待提升 |
  • | MiniMax M2.7 | xxxx | xxxx | https://agent.minimax.io/ |
  • | 豆包 2.0 | 中文天花板,口语化98%准确率,性价比极高 | 极致科研推理略逊 | https://www.doubao.com |
  • | Kimi K2.5 | 262K长文本,Agent能力顶尖,编程测试71.3% | 多模态起步较晚 | https://www.kimi.com | https://chat.deepseek.com |


根据最新的行业分析,MiniMax 目前处于中国大模型公司的第一梯队

梯队划分(2025-2026年)

梯队 公司 特征
第一梯队 DeepSeek、阿里通义(Qwen)、字节豆包、MiniMax、智谱AI 技术领先、商业化清晰、已上市或筹备上市
第二梯队 月之暗面(Kimi)、阶跃星辰、GLM(智谱) 资金充裕、模型指标优秀,商业化起步较晚
第三梯队 百度、腾讯等 传统互联网巨头,转型中

MiniMax 的第一梯队地位依据

  1. 技术实力:M2.7 模型在编程能力(SWE-bench Pro 56.8%)上追平 OpenAI GPT-5.3-Codex,在多模态理解、长上下文处理、逻辑推理等核心能力上进入国内第一梯队

  2. 全球市场份额:在 OpenRouter 全球大模型调用量榜单中,MiniMax M2.5 多次位居全球前三,甚至在某些周次超越谷歌成为全球第一

  3. 商业化成果:2025年前三季度营收5343.7万美元,海外收入占比超70%,C端产品Talkie/星野是全球第二大AI原生交互平台

  4. 资本认可:2026年1月已在香港联交所上市(股票代码:0100.HK),成为”大模型第一股”

  5. 行业评价:被业界称为”国内LLM御三家”之一(与DeepSeek、阿里通义并列),”全球唯四全模态进入第一梯队”的大模型公司

值得注意的是,随着M2.7的发布,MiniMax正从”应用落地最强”向”技术+应用双强”转型,进一步巩固其第一梯队地位。



📊 2026年主流AI产品用户量与访问量统计

数据来源:SimilarWeb、QuestMobile、Statista、First Page Sage 等(截至2026年3月)


🌍 全球市场(按市场份额排名)

排名 产品 市场份额 周活跃用户 月访问量 季度增长
🥇 ChatGPT (OpenAI) 60.4% ~8-9亿 [[13]][[16]] 57.2亿/月 [[12]] +4% ▲
🥈 Google Gemini 15.2% ~18亿/月 [[30]] +12% ▲
🥉 Microsoft Copilot 12.9% ~7.4亿/月 +3% ▲
4️⃣ Perplexity 5.8% ~3.3亿/月 +4% ▲
5️⃣ Claude AI (Anthropic) 4.5% ~1,890万/月 [[20]] ~1.76亿/月 [[27]] +14% ▲ 🔥
6️⃣ Grok (xAI) 0.6% +4% ▲
7️⃣ DeepSeek 0.2% ~2,200万/日(峰值) [[60]] +7% ▲

💡 关键趋势:ChatGPT仍占主导,但份额从2024年的76%+逐步下滑;Claude增长最快(+14%),主打专业用户市场 [[47]]。


🇨🇳 中国市场(按移动端月活排名)

排名 产品 月活跃用户 (MAU) 核心亮点 数据来源
🥇 豆包 (字节) 2.27亿 🔥 背靠抖音生态,移动端统治力强 QuestMobile [[62]][[67]]
🥈 DeepSeek ~1.3亿 技术口碑好,网页端增速+1250% QuestMobile [[60]][[62]]
🥉 腾讯元宝 ~3,286万 微信生态整合 QuestMobile [[55]]
4️⃣ 通义千问 (阿里) 1亿+ C端MAU破亿,打通淘宝/高德/饿了么 阿里官方 [[50]][[56]]
5️⃣ 文心一言 (百度) ~2亿(累计) 搜索场景深度整合 百度官方 [[74]]
6️⃣ Kimi (月之暗面) 长文本处理强,付费用户4倍增长 月之暗面 [[76]][[82]]

📌 注意:中国数据多来自QuestMobile等第三方监测,部分平台未公开精确DAU/MAU。


🔑 关键指标速览

ChatGPT 核心数据

  • 周活跃用户:8-9亿(2026年2月)[[16]]
  • 月访问量:57.2亿(2026年1月)[[12]]
  • 日查询量:25亿+ [[12]]
  • 付费用户:ChatGPT Plus 超1,000万 [[12]]
  • 年收入:~$100亿 ARR(含API+企业版)[[12]]

Claude AI 增长亮点

  • 月活用户:1,890万(网站+App)[[20]]
  • 日活用户:1,100万+(2026年3月峰值)[[21]]
  • 增速最快:季度+14%,专业用户青睐 [[47]]

中国平台特色

平台 差异化优势
豆包 抖音内容生态+短视频生成,用户粘性高
通义千问 “办事型AI”:直接调用淘宝/高德/支付宝完成下单 [[50]]
DeepSeek 开源模型+高性价比API,开发者社区活跃
Kimi 超长上下文(200万token),适合长文档分析 [[80]]

📈 市场趋势总结

  1. 竞争格局多元化:全球市场从”一家独大”转向”多强并存”,专业场景(如编程、法律、学术)催生垂直型AI [[47]]。
  2. 中国速度惊人:豆包、千问等国产应用依靠超级APP生态,用户增长远超全球平均 [[62]][[50]]。
  3. 从”聊天”到”办事”:2026年被视为 Agent爆发元年,能调用外部工具、完成真实任务的AI更受用户青睐 [[50]]。
  4. 移动端为主战场:中国用户90%+通过手机使用AI,全球移动端访问占比也持续上升 [[62]][[12]]。

⚠️ 数据说明

  • “访问量”≠”用户数”(同一用户可能多次访问)
  • 不同机构统计口径略有差异(如是否包含API调用、企业版)
  • 中国部分数据为估算值,官方披露有限


搜索一下全球主流 AI 产品的最新信息,包括 App 可用性和定价。 根据最新搜索结果,我为您整理了全球主流 AI 产品的完整对照表,包括 App 名称、网页地址、定价和实力梯队:

全球主流 AI 产品完整对照表

产品 所属公司 网页端 App端 Session 同步 免费额度 付费方案 实力梯队 App Store 搜索名称 网页访问地址
ChatGPT OpenAI (美国) ✅ iOS/Android ✅ 完全同步 基础功能免费 Plus: $20/月;Pro: $200/月;Go: $8/月 🥇 全球第一梯队 “ChatGPT” chatgpt.com
Claude Anthropic (美国) ✅ iOS/Android ✅ 完全同步 基础免费 Pro: $20/月;Team: $30/人/月;Max: $100-200/月 🥇 全球第一梯队 – 代码之王 “Claude by Anthropic” claude.ai
Gemini Google (美国) ✅ iOS/Android ✅ 完全同步 基础免费 Advanced: $19.99/月;Ultra: $249.99/月 🥇 全球第一梯队 – 多模态霸主 “Google Gemini” gemini.google.com
Perplexity Perplexity (美国) ✅ iOS/Android ✅ 完全同步 基础搜索免费 Pro: $20/月;Enterprise: 定制 🥈 全球第二梯队 – 搜索研究专家 “Perplexity” perplexity.ai
Copilot Microsoft (美国) ✅ iOS/Android ✅ 完全同步 基础免费 Microsoft 365 Premium: $19.99/月;Enterprise: $30/人/月 🥈 全球第二梯队 – 办公集成 “Microsoft Copilot” copilot.microsoft.com
Grok xAI (美国) ✅ iOS/Android ✅ 完全同步 X Premium 用户免费 X Premium: $8/月;SuperGrok: 额外付费 🥈 全球第二梯队 – 实时信息 “Grok” grok.com
豆包/Cici 字节跳动 (中国) ✅ iOS/Android ✅ 完全同步 完全免费 免费版已覆盖全功能 🥇 国内第一梯队 – 中文天花板 “Cici” (国际版) / “Doubao” doubao.com
通义千问/Qwen 阿里巴巴 (中国) ✅ iOS/Android ✅ 完全同步 完全免费 公测期全免费 🥇 国内第一梯队 “Qwen” / “Tongyi Qianwen” tongyi.aliyun.com
Kimi 月之暗面 (中国) ✅ iOS/Android ✅ 完全同步 基础功能免费 打赏制: ¥5.2-¥399 解锁优先 🥇 国内第一梯队 – 长文本之王 “Kimi – Now with K2.5” kimi.com
DeepSeek DeepSeek (中国) ✅ iOS/Android ⚠️ 部分同步 完全免费 目前全免费 🥇 国内第一梯队 – 性价比之王 “DeepSeek – AI Assistant” chat.deepseek.com
文心一言 百度 (中国) ✅ iOS/Android ✅ 完全同步 2025年4月起完全免费 原¥59.9/月,现已取消 🥈 国内第二梯队 “Ernie Bot” yiyan.baidu.com
智谱清言/GLM 智谱AI (中国) ✅ iOS/Android ✅ 完全同步 基础免费 GLM-4 Plus: ¥50/百万token 🥈 国内第二梯队 – 代码维护强 “ChatGLM” / “Zhipu Qingyan” chatglm.cn
腾讯元宝 腾讯 (中国) ✅ iOS/Android ✅ 完全同步 基础免费 高级功能付费 🥈 国内第二梯队 “Tencent Yuanbao” yuanbao.tencent.com
Llama Meta (美国) ❌ 无官方App 开源免费 自托管免费;云服务按量计费 🥇 开源第一梯队 无官方App,第三方客户端 llama.meta.com
Mistral Mistral (法国) ❌ 无官方App 基础免费 Large: 按量计费;企业定制 🥈 欧洲第一梯队 无官方App chat.mistral.ai

🌍 全球实力梯队解析

🥇 第一梯队(闭源旗舰)

标准:综合能力全球前5,参数规模万亿级,多模态原生支持

模型 核心优势 短板
GPT-5 Ultra 全能无短板,10万亿参数,生态最丰富 价格昂贵,中文弱
Claude 4 Opus 代码之王,长文本200万token,安全合规顶级 多模态弱,中文支持不足
Gemini 3 Ultra 多模态霸主,1000万token上下文,视频理解最强 交互生硬,中文体验一般
豆包 2.0 中文天花板,口语化98%准确率,性价比极高 极致科研推理略逊
通义千问 3.5-Max 26.2万上下文,轻量化离线部署,企业级最强 国际版功能待完善
Kimi K2.5 262K长文本,Agent能力顶尖,编程测试71.3% 多模态起步较晚
DeepSeek-R1 600万美元训练成本,数学代码媲美o1,完全免费 品牌知名度待提升

🥈 第二梯队(场景专家)

标准:单项能力突出或特定场景最优,综合略逊于第一梯队

模型 定位 优势场景
Perplexity 搜索研究专家 实时信息检索、学术溯源
Copilot 办公生产力 Microsoft 365生态集成、企业工作流
Grok 实时信息+社交 X平台联动、Z世代交互、新闻时效
文心一言 5.0 中文搜索增强 百度知识图谱、工业视觉诊断
智谱 GLM-5 企业Agent 代码维护第二梯队最强,开源生态成熟
腾讯元宝 社交娱乐 微信生态、游戏NPC、内容创作

🥉 第三梯队(垂直/区域)

模型 定位 特点
Mistral 欧洲合规首选 GDPR合规,轻量高效,API价格仅为GPT-1/3
Llama 4 开源标杆 7000亿参数,但受训练作弊丑闻影响
讯飞星火 语音专家 中文语音识别最强,教育场景深耕

💰 定价策略对比

价格带 代表产品 适合人群
完全免费 DeepSeek、豆包、通义千问、文心一言 学生、个人用户、初创团队
$8-20/月 ChatGPT Go、Claude Pro、Gemini Advanced、Perplexity Pro 专业用户、知识工作者
$30-60/月 Copilot Business、ChatGPT Plus 企业团队、开发者
$200+/月 ChatGPT Pro、Gemini Ultra 重度用户、企业高管

📱 德国 iPhone 用户推荐组合

使用场景 推荐App 备选方案
日常全能 Cici (豆包国际版) ChatGPT
深度工作/Agent Kimi Claude
编程开发 DeepSeek GitHub Copilot Pro ($10/月)
学术研究 Perplexity Claude
办公套件 Copilot (Microsoft 365用户) Gemini Advanced (Google用户)


我来搜索一下这些国内 AI App 在德国 iPhone App Store 的可用性和访问方式。 根据最新搜索结果,我为您整理了在德国 iPhone 上可下载的国内 AI App 完整对照表:

国内 AI 产品德国可用性对照表

产品 网页端 App端 Session 同步 免费额度 实力评级 德国 App Store 搜索名称 网页访问地址 备注
豆包 (字节) ✅ 完全同步 完全免费 ⭐⭐⭐⭐⭐ 第一梯队 “Cici”“Doubao International” doubao.com 国际版名为 Cici,支持 18 种语言包括德语
通义千问 (阿里) ✅ 完全同步 完全免费 ⭐⭐⭐⭐⭐ 第一梯队 “Qwen”(国际版即将推出) tongyi.aliyun.com 2025年11月已升级为”Qianwen” App,国际版即将发布
Kimi (月之暗面) ✅ 完全同步 基础免费 ⭐⭐⭐⭐⭐ 第一梯队 “Kimi – Now with K2.5” kimi.com 德国 App Store 可直接下载,Agent 功能强大
DeepSeek ⚠️ 部分同步 完全免费 ⭐⭐⭐⭐⭐ 第一梯队 “DeepSeek – AI Assistant” chat.deepseek.com 德国可用,评分 4.03/5,完全免费
文心一言 (百度) ✅ 完全同步 2025年4月起完全免费 ⭐⭐⭐⭐☆ 强第二梯队 “Ernie Bot” yiyan.baidu.com 国际版可用,但功能可能受限
智谱清言 (GLM) ✅ 完全同步 基础免费 ⭐⭐⭐⭐☆ 第二梯队 “ChatGLM”“Zhipu Qingyan” chatglm.cn 国际版可用
腾讯元宝 ✅ 完全同步 基础免费 ⭐⭐⭐⭐☆ 第二梯队 “Tencent Yuanbao” yuanbao.tencent.com 国际版可用性有限

🔍 德国 iPhone 用户特别指南

立即可用(推荐下载)

App 名称 在 App Store 搜索 语言支持 特点
Cici “Cici” 或 “Doubao” 支持德语、英语等18种语言 字节跳动国际版,完全免费,功能与国内版基本一致
Kimi “Kimi – Now with K2.5” 中文、英文 Agent 能力最强,支持 Office 文件处理,德国可直接下载
DeepSeek “DeepSeek – AI Assistant” 中文、英文 完全免费,推理能力强,德国可用

即将推出国际版

产品 现状 预计时间
通义千问 (Qwen) 国内已升级为”Qianwen” App,国际版即将推出 2026年初

使用建议

  1. 首选组合

    • 日常全能Cici (豆包国际版) – 完全免费,多语言支持好
    • 深度工作Kimi – Agent 和文档处理能力最强
    • 编程推理DeepSeek – 免费且推理能力对标 GPT-4
  2. 网页端备用

    • 所有产品网页端均可直接访问,无需 VPN
    • 建议同时保存网页版书签,App 功能受限时可切换
  3. 账号同步

    • 使用邮箱注册(建议 Gmail/Outlook)
    • 避免使用+86手机号,可能收不到验证码
  4. 语言设置

    • Cici 支持德语界面,其他主要为中英双语
    • 所有产品均支持英文/中文对话

注意:部分 App 的海外版本功能可能略有精简(如支付、本地服务集成等),但核心 AI 能力保持一致。



最近深度研究了 AI 订阅方案,发现一个极具性价比的黄金组合——

主力 1:Qwen 通义千问(免费) 阿里出品,同样免费,编程表现稳健可靠,作为第三道保险绰绰有余。

主力 2:DeepSeek(免费) 国产之光,编程能力强得惊艳,完全免费。Claude 额度耗尽后无缝切换,毫无违和感。

主力 3:Qwen 通义千问(免费)

备用:Claude Pro($20/月) 处理复杂代码、长上下文项目。唯一的缺点是每月有使用频率限制,高强度使用后会触发限速。


🦊 为什么 DeepSeek 在 Firefox 上有时无法正常使用?

1. Firefox 的安全警告机制更严格 Firefox 对 https://deepseek.com 会触发”潜在安全风险”提示(而 https://www.deepseek.com 则不受影响),这可能导致用户无法正常访问或产生困惑。Chrome 对此类 SSL 边缘情况处理更为宽松。

2. Firefox 对高负载页面的渲染方式不同 有用户反映,在 Firefox 上处理长对话时,DeepSeek 会出现崩溃和内存占用过高的问题,而在 Chromium 内核浏览器上则几乎不存在此类现象。此外,DeepSeek R1 的”思考过程”展示组件在 Firefox 上甚至无法正常显示——这是因为 DeepSeek 的前端主要针对 Chromium 内核进行了优化。

3. DeepSeek 网页应用以 Chrome 为主要开发目标 与许多现代中国网页应用类似,DeepSeek 使用了部分 JavaScript 和 CSS 特性(例如流式 Markdown 渲染),这些特性在 Chrome/Edge(Blink 内核)下运行更为稳定,而在 Firefox(Gecko 内核)下则容易出现兼容性问题。

4. DeepSeek 更新后 Firefox 扩展频繁失效 每当 DeepSeek 对后端 HTML 结构进行调整,Firefox 上的相关扩展插件往往随即失效,而 Chrome 扩展的维护更新相对更加及时,受此影响较小。

✅ Firefox 用户的解决方案: Firefox 支持通过 about:config 将 DeepSeek AI 直接集成到浏览器侧边栏中(设置 browser.ml.chat.provider 参数),这比普通标签页方式更为稳定,推荐尝试。

总结: Chrome/Edge 采用 Chromium 内核,DeepSeek 完全支持;Firefox 采用 Gecko 内核,存在偶发性崩溃、界面显示异常或安全警告等问题。若追求最佳体验,建议三款 AI 工具均优先使用 Chrome 浏览器。