Author Archives: gene_x

Workflow using QIIME2 for Data_Karoline_16S_2025

Install and test qiime2-docker

#Cannot run under QIIME1, switch to QIIME2: pick_open_reference_otus.py -r/home/jhuang/REFs/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna -i test.fna -o clustering_test/ -p clustering_params.txt --parallel --verbose

docker pull quay.io/qiime2/core:2023.9

docker run -it --rm \
-v /mnt/md1/DATA/Data_Marius_16S_2025:/data \
-v /home/jhuang/REFs:/home/jhuang/REFs \
quay.io/qiime2/core:2023.9 bash
cd /data

Import the fastq-files to paired-end-demux.qza

#https://docs.qiime2.org/2018.8/tutorials/importing/

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path pe-33-manifest --output-path paired-end-demux.qza --input-format PairedEndFastqManifestPhred33
#--> 1095204304 Mai 27 15:11 paired-end-demux.qza

qiime demux summarize \
--i-data paired-end-demux.qza \
--o-visualization demux_pe.qzv

#https://view.qiime2.org
#qiime tools view demux_pe.qzv

Optimizing the parameters trunc-len-f and trunc-len-r and denoising with DADA2: optimized parameters is f240_r240

#Your amplicon (V3–V4 region) is ~464 bp, so you need ≥20–30 bp overlap
#464-38=426; 440 is the longst +12 nt for overlapping=we need 452 nt!

#Optimize the parameters --p-trunc-len-f and --p-trunc-len-r
./dada2_batch_test.sh

        #!/bin/bash

        # Set your base inputs
        INPUT=paired-end-demux.qza
        TRIM_LEFT_F=17
        TRIM_LEFT_R=21

        # Output base
        OUTPUT_DIR=dada2_tests
        mkdir -p $OUTPUT_DIR

        # Loop over trunc-len-f and trunc-len-r combinations
        # Forward: from 220 to 240
        # Reverse: from 210 to 230
        i=1
        for f in 240 235 230 225; do
        for r in 225 220 215; do
            OUT=test_${i}_f${f}_r${r}
            echo "Running: $OUT"
            mkdir -p $OUTPUT_DIR/$OUT

            qiime dada2 denoise-paired \
            --i-demultiplexed-seqs $INPUT \
            --p-trim-left-f $TRIM_LEFT_F \
            --p-trim-left-r $TRIM_LEFT_R \
            --p-max-ee-f 3 --p-max-ee-r 5 \
            --p-trunc-len-f $f \
            --p-trunc-len-r $r \
            --p-n-threads 32 \
            --o-table $OUTPUT_DIR/$OUT/table.qza \
            --o-representative-sequences $OUTPUT_DIR/$OUT/rep-seqs.qza \
            --o-denoising-stats $OUTPUT_DIR/$OUT/denoising-stats.qza \
            --verbose > $OUTPUT_DIR/$OUT/log.txt 2>&1

            ((i++))
        done
        done

for f in dada2_tests2/test_*/denoising-stats.qza; do
qiime metadata tabulate \
    --m-input-file $f \
    --o-visualization ${f%.qza}.qzv
done

#pandaseq.out: grep ">" A1_R1.fastq.gz_merged.fasta | wc -l #8229;  grep ">" A10_R1.fastq.gz_merged.fasta | wc -l #9165

# The best choice is f251_r251, since the first 17 and 21 bases with bad quality are anyway removed!
#f251_r251: sample-A1   18299   10989   60.05   10609   7129    38.96   6535    35.71;  sample-A10  18736   12249   65.38   11778   7444    39.73   6978    37.24
#f251_r250: sample-A1   18299   11855   64.78   11435   7431    40.61   6823    37.29;  sample-A10  18736   13092   69.88   12590   7714    41.17   7206    38.46
#f251_r245: sample-A1   18299   12642   69.09   12180   7860    42.95   7193    39.31;  sample-A10  18736   13981   74.62   13457   8218    43.86   7649    40.83
#f251_r240: sample-A1   18299   12678   69.28   12214   8060    44.05   7387    40.37;  sample-A10  18736   14018   74.82   13498   8412    44.9    7758    41.41

#f250_r240: sample-A1   18299   13705   74.89   13191   8796    48.07   7984    43.63
#f250_r235: sample-A1   18299   13716   74.95   13198   8793    48.05   7969    43.55
#f250_r230: sample-A1   18299   13739   75.08   13217   9023    49.31   8113    44.34;  sample-A10  18736   14838   79.2    14159   8895    47.48   8101    43.24
#f245_r240: sample-A1   18299   14513   79.31   14019   9472    51.76   8739    47.76;  sample-A10  18736   15609   83.31   15102   9605    51.26   8880    47.4
#f245_r235: sample-A1   18299   14524   79.37   14026   9485    51.83   8746    47.79;  sample-A10  18736   15634   83.44   15127   9685    51.69   8869    47.34
#f245_r230: sample-A1   18299   14547   79.5    14045   8845    48.34   8058    44.04;  sample-A10  18736   15664   83.6    15156   8812    47.03   7999    42.69
#f240_r240: sample-A1   18299   14647   80.04   14164   9869    53.93   8932    48.81;  sample-A10  18736   15728   83.95   15213   10488   55.98   9081    48.47 *
#f240_r235: sample-A1   18299   14661   80.12   14172   9125    49.87   8194    44.78;  sample-A10  18736   15755   84.09   15242   9579    51.13   8105    43.26
#f240_r230: sample-A1   18299   14686   80.26   14191   4952    27.06   4666    25.5;   sample-A10  18736   15785   84.25   15267   3489    18.62   3341    17.83

#f240_r225: sample-A1   18299   14701   80.34   14206   4575    25  4297    23.48
#f240_r220: sample-A1   18299   14720   80.44   14223   4588    25.07   4310    23.55
#f240_r225: sample-A1   18299   14747   80.59   14250   3976    21.73   3758    20.54
#f230_r220: sample-A1   18299   14972   81.82   14514   3   0.02    3   0.02

#The output of the optimized denoising: (*) dada2_tests2/test_7_f240_r240/table.qza and dada2_tests2/test_7_f240_r240/rep-seqs.qza.

Visualize outputs

qiime feature-table summarize \
--i-table dada2_tests2/test_7_f240_r240/table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file qiime2_metadata.tsv

#Table summary
#Metric Sample
#Number of samples  137
#Number of features 3,039
#Total frequency    1,641,484
#
#Frequency per sample
#Minimum frequency  413.0
#1st quartile   10,319.0
#Median frequency   11,530.0
#3rd quartile   13,146.0
#Maximum frequency  40,022.0
#Mean frequency 11,981.635036496351
#
#Frequency per feature
#Minimum frequency  1.0
#1st quartile   3.0
#Median frequency   8.0
#3rd quartile   95.5
#Maximum frequency  56,472.0
#Mean frequency 540.1395195788089

#qiime tools peek table.qza
#qiime tools peek qiime2_metadata.tsv

qiime feature-table tabulate-seqs \
--i-data dada2_tests2/test_7_f240_r240/rep-seqs.qza \
--o-visualization rep-seqs.qzv

qiime metadata tabulate \
--m-input-file dada2_tests2/test_7_f240_r240/denoising-stats.qza \
--o-visualization denoising-stats.qzv

Import reference sequences and taxonomy (SILVA 132)

qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path /home/jhuang/REFs/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna \
--output-path silva_132_99_otus.qza \
--input-format DNAFASTAFormat

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path /home/jhuang/REFs/SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt \
--output-path silva_132_99_taxonomy.qza

Assign taxonomy

qiime feature-classifier classify-consensus-vsearch \
--i-query dada2_tests2/test_7_f240_r240/rep-seqs.qza \
--i-reference-reads silva_132_99_otus.qza \
--i-reference-taxonomy silva_132_99_taxonomy.qza \
--p-perc-identity 0.97 \
--p-threads 64 \
--o-classification taxonomy.qza \
--o-search-results search-results.qza

Visualize taxonomy

qiime taxa barplot \
--i-table dada2_tests2/test_7_f240_r240/table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file qiime2_metadata.tsv \
--o-visualization taxa-bar-plots.qzv

Build phylogenetic tree

qiime alignment mafft \
--i-sequences dada2_tests2/test_7_f240_r240/rep-seqs.qza \
--o-alignment aligned-rep-seqs.qza

qiime alignment mask \
--i-alignment aligned-rep-seqs.qza \
--o-masked-alignment masked-aligned-rep-seqs.qza

qiime phylogeny fasttree \
--i-alignment masked-aligned-rep-seqs.qza \
--o-tree unrooted-tree.qza

(*) qiime phylogeny midpoint-root \
--i-tree unrooted-tree.qza \
--o-rooted-tree rooted-tree.qza

Core diversity analysis

#The -e 6389 flag sets the even sampling depth (rarefaction depth) to 6,389 reads for diversity analyses.
#All samples will be rarefied to 4,753 reads.
#Samples with fewer reads are excluded.
qiime diversity core-metrics-phylogenetic \
--i-phylogeny rooted-tree.qza \
--i-table dada2_tests2/test_7_f240_r240/table.qza \
--p-sampling-depth 6389 \
--m-metadata-file qiime2_metadata.tsv \
--output-dir core_metrics_results

qiime diversity alpha \
--i-table table.qza \
--p-metric chao1 \
--o-alpha-diversity core_metrics_results/chao1_vector.qza

qiime tools export --input-path core_metrics_results/shannon_vector.qza --output-path exported_alpha/shannon
qiime tools export --input-path core_metrics_results/faith_pd_vector.qza --output-path exported_alpha/faith_pd
qiime tools export --input-path core_metrics_results/observed_features_vector.qza --output-path exported_alpha/observed_features
qiime tools export --input-path core_metrics_results/chao1_vector.qza --output-path exported_alpha/chao1

qiime tools export \
--input-path core_metrics_results/unweighted_unifrac_distance_matrix.qza \
--output-path exported_unweighted_unifrac
qiime tools export \
--input-path core_metrics_results/weighted_unifrac_distance_matrix.qza \
--output-path exported_weighted_unifrac

qiime diversity beta-group-significance \
--i-distance-matrix core_metrics_results/weighted_unifrac_distance_matrix.qza \
--m-metadata-file qiime2_metadata.tsv \
--m-metadata-column Group \
--p-pairwise \
--p-method permanova \
--o-visualization beta_group_significance.qzv

qiime tools export \
--input-path beta_group_significance.qzv \
--output-path exported_beta_group

#✅ Group 1 / Group 2 — The pairwise comparisons.
#✅ Sample size — Number of samples used in the test.
#✅ Permutations — Number of permutations in the PERMANOVA test.
#✅ pseudo-F — The test statistic from PERMANOVA.
#✅ p-value — The unadjusted p-value.
#✅ q-value — The adjusted p-value (Bonferroni in QIIME2).
#The q-value column (also sometimes called p-adj) is the multiple-testing corrected p-value.
#👉 q < 0.05 means the difference is statistically significant between those two groups, even after correction.
#“There is a significant difference in community composition between Group 1 and Group 2 (p=0.002).”

#The --p-sampling-depth 6389 parameter is directly equivalent to QIIME 1’s -e 6389!
#The QIIME 2 command will compute:
#✅ Alpha diversity metrics (Observed OTUs, Shannon, Faith PD, Evenness)
#✅ Beta diversity distance matrices (UniFrac, Bray-Curtis)
#✅ PCoA plots
#✅ Excludes samples with fewer than 6,389 reads.

#📦 Output Folders and Files
#Output Description
#table.qzv  Visual of feature table (sample counts)
#rep-seqs.qzv   Sequences per ASV
#denoising-stats.qzv    DADA2 read tracking
#taxonomy.qza/.qzv  Taxonomic classification of ASVs
#taxa-bar-plots.qzv Interactive bar plots
#core_metrics_results/  Alpha/beta diversity metrics + PCoA plots

Prepare three files feeding to Phyloseq.Rmd: table.qza (see above with ), rooted-tree.qza (see above with ), qiime2_metadata_for_qza_to_phyloseq.tsv edited from qiime2_metadata.tsv.

# Rarefying can be performed here, or in Phyloseq.Rmd (default), therefore, we don't need this step any more.
qiime feature-table summarize \
--i-table core_metrics_results/rarefied_table.qza \
--o-visualization rarefied_table.qzv \
--m-sample-metadata-file qiime2_metadata.tsv

#Table summary
#Metric Sample
#Number of samples  136
#Number of features 2,781
#Total frequency    868,904

# In QIIME2, we need table.qza, not biom-file, therefore, we don't need this step any more.
qiime tools export \
--input-path core_metrics_results/rarefied_table.qza \
--output-path exported_rarefied_table
#-->
exported_rarefied_table/feature-table.biom

biom convert \
-i exported_rarefied_table/feature-table.biom \
-o exported_rarefied_table/feature-table.tsv \
--to-tsv

#✅ Old QIIME 1 table with GenBank IDs (like EF603722.1.1487) as feature labels.
#✅ QIIME 2 table where feature IDs are hashes (like 0b438323a296b5f2ce2c8bbe3949ee8d).

# Visulaize the taxonomy.qza
qiime tools export \
--input-path taxonomy.qza \
--output-path exported-taxonomy

#Feature ID    Taxon                               Confidence
#0b4383...     k__Bacteria; p__Proteobacteria...   0.98
#dfa833...     k__Bacteria; p__Firmicutes...       0.87
#...

# ---- I used the following to generate two file for feeding in the Phyloseq.Rmd ----

#-1- exported_table/feature-table.biom corresesponds to table_even6389.biom in QIIME1, but in QIIME2, we don't need biom-file, instead of table.qza.

qiime tools export \
--input-path dada2_tests2/test_7_f240_r240/table.qza \
--output-path exported_table
#--> exported_table/feature-table.biom

#-2- exported-tree/tree.nwk corresesponds to rep_set.tre in QIIME1

qiime tools export \
--input-path rooted-tree.qza \
--output-path exported-tree
#--> exported-tree/tree.nwk

# ---- The code in Phyloseq.Rmd ----

#install.packages("remotes")
#remotes::install_github("jbisanz/qiime2R")
#"core_metrics_results/rarefied_table.qza", rarefying performed in the code, therefore import the raw table.
library(qiime2R)
ps.ng.tax <- qza_to_phyloseq(
    features =  "dada2_tests2/test_7_f240_r240/table.qza",
    tree = "rooted-tree.qza",
    metadata = "qiime2_metadata_for_qza_to_phyloseq.tsv"
)
# or
#biom convert \
#      -i ./exported_table/feature-table.biom \
#      -o ./exported_table/feature-table-v1.biom \
#      --to-json
#ps.ng.tax <- import_biom("./exported_table/feature-table-v1.biom", treefilename="./exported-tree/tree.nwk")

#Note that the alpha- and beta-diversity-files needed in Phyloseq.Rmd has been prepared in the step 9.

Figures generated by Phyloseq.Rmd and MicrobiotaProcess_*.R

The following files can be found under server.

./Phyloseq.Rmd (Result Phyloseq.html)
./MicrobiotaProcess_cluster1_Group9-11_vs_cluster2_Group12-14_orig.R
./MicrobiotaProcess_Group1_vs_Group2.R
./MicrobiotaProcess_Group3_vs_Group4.R
./MicrobiotaProcess_PCA_Group1-4.R
./MicrobiotaProcess_PCA_Group9-14.R

Workflow using PICRUSt2 for Data_Karoline_16S_2025

Leave a reply

Environment Setup: It sets up a Conda environment named picrust2, using the conda create command and then activates this environment using conda activate picrust2.

#https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.2.0-beta)#minimum-requirements-to-run-full-tutorial
mamba create -n picrust2 -c bioconda -c conda-forge picrust2    #2.5.3  #=2.2.0_b
mamba activate /home/jhuang/miniconda3/envs/picrust2

Under env (qiime2-amplicon-2023.9)

Export QIIME2 feature table and representative sequences

#docker pull quay.io/qiime2/core:2023.9
#docker run -it --rm \
#-v /mnt/md1/DATA/Data_Karoline_16S_2025:/data \
#-v /home/jhuang/REFs:/home/jhuang/REFs \
#quay.io/qiime2/core:2023.9 bash
#cd /data
# === SETTINGS ===
FEATURE_TABLE_QZA="dada2_tests2/test_7_f240_r240/table.qza"
REP_SEQS_QZA="dada2_tests2/test_7_f240_r240/rep-seqs.qza"

# === STEP 1: EXPORT QIIME2 ARTIFACTS ===
mkdir -p qiime2_export
qiime tools export --input-path $FEATURE_TABLE_QZA --output-path qiime2_export
qiime tools export --input-path $REP_SEQS_QZA --output-path qiime2_export

Convert BIOM to TSV for Picrust2 input

biom convert \
-i qiime2_export/feature-table.biom \
-o qiime2_export/feature-table.tsv \
--to-tsv

Under env (picrust2): mamba activate /home/jhuang/miniconda3/envs/picrust2

Run PICRUSt2 pipeline

tail -n +2 qiime2_export/feature-table.tsv > qiime2_export/feature-table-fixed.tsv
picrust2_pipeline.py \
-s qiime2_export/dna-sequences.fasta \
-i qiime2_export/feature-table-fixed.tsv \
-o picrust2_out \
-p 100

#This will:
#* Place sequences in the reference tree (using EPA-NG),
#* Predict gene family abundances (e.g., EC, KO, PFAM, TIGRFAM),
#* Predict pathway abundances.

#In current PICRUSt2 (with picrust2_pipeline.py), you do not run hsp.py separately.
#Instead, picrust2_pipeline.py internally runs the HSP step for all functional categories automatically. It outputs all the prediction files (16S_predicted_and_nsti.tsv.gz, COG_predicted.tsv.gz, PFAM_predicted.tsv.gz, KO_predicted.tsv.gz, EC_predicted.tsv.gz, TIGRFAM_predicted.tsv.gz, PHENO_predicted.tsv.gz) in the output directory.

mkdir picrust2_out_advanced; cd picrust2_out_advanced
#If you still want to run hsp.py manually (advanced use / debugging), the commands correspond directly:
hsp.py -i 16S -t ../picrust2_out/out.tre -o 16S_predicted_and_nsti.tsv.gz -p 100 -n
hsp.py -i COG -t ../picrust2_out/out.tre -o COG_predicted.tsv.gz -p 100
hsp.py -i PFAM -t ../picrust2_out/out.tre -o PFAM_predicted.tsv.gz -p 100
hsp.py -i KO -t ../picrust2_out/out.tre -o KO_predicted.tsv.gz -p 100
hsp.py -i EC -t ../picrust2_out/out.tre -o EC_predicted.tsv.gz -p 100
hsp.py -i TIGRFAM -t ../picrust2_out/out.tre -o TIGRFAM_predicted.tsv.gz -p 100
hsp.py -i PHENO -t ../picrust2_out/out.tre -o PHENO_predicted.tsv.gz -p 100

Metagenome prediction per functional category (if needed separately)

#cd picrust2_out_advanced
metagenome_pipeline.py -i ../qiime2_export/feature-table.biom -m 16S_predicted_and_nsti.tsv.gz -f COG_predicted.tsv.gz -o COG_metagenome_out --strat_out
metagenome_pipeline.py -i ../qiime2_export/feature-table.biom -m 16S_predicted_and_nsti.tsv.gz -f EC_predicted.tsv.gz -o EC_metagenome_out --strat_out
metagenome_pipeline.py -i ../qiime2_export/feature-table.biom -m 16S_predicted_and_nsti.tsv.gz -f KO_predicted.tsv.gz -o KO_metagenome_out --strat_out
metagenome_pipeline.py -i ../qiime2_export/feature-table.biom -m 16S_predicted_and_nsti.tsv.gz -f PFAM_predicted.tsv.gz -o PFAM_metagenome_out --strat_out
metagenome_pipeline.py -i ../qiime2_export/feature-table.biom -m 16S_predicted_and_nsti.tsv.gz -f TIGRFAM_predicted.tsv.gz -o TIGRFAM_metagenome_out --strat_out

# Add descriptions in gene family tables
add_descriptions.py -i COG_metagenome_out/pred_metagenome_unstrat.tsv.gz -m COG -o COG_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC -o EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO -o KO_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz   # EC and METACYC is a pair, EC for gene_annotation and METACYC for pathway_annotation
add_descriptions.py -i PFAM_metagenome_out/pred_metagenome_unstrat.tsv.gz -m PFAM -o PFAM_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i TIGRFAM_metagenome_out/pred_metagenome_unstrat.tsv.gz -m TIGRFAM -o TIGRFAM_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz

Pathway inference (MetaCyc pathways from EC numbers)

#cd picrust2_out_advanced
pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_contrib.tsv.gz -o EC_pathways_out -p 100
pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -o EC_pathways_out_per_seq -p 100 --per_sequence_contrib --per_sequence_abun EC_metagenome_out/seqtab_norm.tsv.gz --per_sequence_function EC_predicted.tsv.gz
#ERROR due to missing .../pathway_mapfiles/KEGG_pathways_to_KO.tsv
pathway_pipeline.py -i COG_metagenome_out/pred_metagenome_contrib.tsv.gz -o KEGG_pathways_out -p 100 --no_regroup --map /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv
pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_strat.tsv.gz -o KEGG_pathways_out -p 100 --no_regroup --map /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv

add_descriptions.py -i EC_pathways_out/path_abun_unstrat.tsv.gz -m METACYC -o EC_pathways_out/path_abun_unstrat_descrip.tsv.gz
gunzip EC_pathways_out/path_abun_unstrat_descrip.tsv.gz

#Error - no rows remain after regrouping input table. The default pathway and regroup mapfiles are meant for EC numbers. Note that KEGG pathways are not supported since KEGG is a closed-source database, but you can input custom pathway mapfiles if you have access. If you are using a custom function database did you mean to set the --no-regroup flag and/or change the default pathways mapfile used?
#If ERROR --> USE the METACYC for downstream analyses!!!

#ERROR due to missing .../description_mapfiles/KEGG_pathways_info.tsv.gz
#add_descriptions.py -i KO_pathways_out/path_abun_unstrat.tsv.gz -o KEGG_pathways_out/path_abun_unstrat_descrip.tsv.gz --custom_map_table /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_pathways_info.tsv.gz

#NOTE: Target-analysis for the pathway "mixed acid fermentation"

Visualization

#7.1 STAMP
#https://github.com/picrust/picrust2/wiki/STAMP-example
#Note that STAMP can only be opened under Windows

# It needs two files: path_abun_unstrat_descrip.tsv.gz as "Profile file" and metadata.tsv as "Group metadata file".
cp ~/DATA/Data_Karoline_16S_2025/picrust2_out_advanced/EC_pathways_out/path_abun_unstrat_descrip.tsv ~/DATA/Access_to_Win7/

cut -d$'\t' -f1 qiime2_metadata.tsv > 1
cut -d$'\t' -f3 qiime2_metadata.tsv > 3
cut -d$'\t' -f5-6 qiime2_metadata.tsv > 5_6
paste -d$'\t' 1 3 > 1_3
paste -d$'\t' 1_3 5_6 > metadata.tsv
#SampleID --> SampleID
SampleID        Group   pre_post        Sex_age
sample-A1       Group1  3d.post.stroke  male.aged
sample-A2       Group1  3d.post.stroke  male.aged
sample-A3       Group1  3d.post.stroke  male.aged
cp ~/DATA/Data_Karoline_16S_2025/metadata.tsv ~/DATA/Access_to_Win7/

#7.2. ALDEx2
https://bioconductor.org/packages/release/bioc/html/ALDEx2.html

Under env (qiime2-amplicon-2023.9)

(NOT_NEEDED) Convert pathway output to BIOM and re-import to QIIME2 gunzip picrust2_out/pathways_out/path_abun_unstrat.tsv.gz biom convert \ -i picrust2_out/pathways_out/path_abun_unstrat.tsv \ -o picrust2_out/path_abun_unstrat.biom \ –table-type=”Pathway table” \ –to-hdf5

qiime tools import \
--input-path picrust2_out/path_abun_unstrat.biom \
--type 'FeatureTable[Frequency]' \
--input-format BIOMV210Format \
--output-path path_abun.qza

#qiime tools export --input-path path_abun.qza --output-path exported_path_abun
#qiime tools peek path_abun.qza
echo "✅ PICRUSt2 pipeline complete. Output in: picrust2_out"

For QIIME1

Environment Setup: It sets up a Conda environment named picrust2, using the conda create command and then activates this environment using conda activate picrust2.

#https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.2.0-beta)#minimum-requirements-to-run-full-tutorial
mamba create -n picrust2 -c bioconda -c conda-forge picrust2    #2.5.3  #=2.2.0_b
mamba activate /home/jhuang/miniconda3/envs/picrust2

Data Preparation: The script creates a new directory called picrust2_out, then enters it using mkdir and cd commands. It then identifies input files that are needed for the analysis: metadata.tsv, seqs.fna, table.biom. The biom commands are used to inspect and convert the BIOM format files.

mkdir picrust2_out_2024_2
cd picrust2_out_2024_2

# Identifying input data
# Note: Replace the paths and filenames with your actual data if different
# metadata.tsv == ../map_corrected.txt
# seqs.fna     == ../clustering/seqs.fna
# table.biom   == ../core_diversity_e42369/table_even42369.biom

# Inspect and convert the BIOM format files
biom head -i ../core_diversity_e42369/table_even42369.biom
biom summarize-table -i ../core_diversity_e42369/table_even42369.biom
biom convert -i ../core_diversity_e42369/table_even42369.biom -o table_even42369.tsv --to-tsv
#For QIIME2: exported_rarefied_table/feature-table.tsv

Running PiCRUST2: The place_seqs.py command aligns the input sequences to a reference tree. The hsp.py commands generate hidden state prediction for multiple functional categories.

#insert reads into reference tree using EPA-NG
cp ../clustering/rep_set.fna ./
grep ">" rep_set.fna | wc -l  #40990
vim table_even42369.tsv       #40596-2

samtools faidx rep_set.fna
cut -f1-1 table_even42369.tsv > table_even42369.id
#manually modify table_even42369.id by replacing "\n" with " >> seqs.fna\nsamtools faidx rep_set.fna "
run table_even42369.id

rm -rf intermediate/
place_seqs.py -s seqs.fna -o out.tre -p 4 --intermediate intermediate/place_seqs

#castor: Efficient Phylogenetics on Large Trees
#https://github.com/picrust/picrust2/wiki/Hidden-state-prediction

hsp.py -i 16S -t out.tre -o 16S_predicted_and_nsti.tsv.gz -p 100 -n
hsp.py -i COG -t out.tre -o COG_predicted.tsv.gz -p 100
hsp.py -i PFAM -t out.tre -o PFAM_predicted.tsv.gz -p 15
hsp.py -i KO -t out.tre -o KO_predicted.tsv.gz -p 15
hsp.py -i EC -t out.tre -o EC_predicted.tsv.gz -p 15
hsp.py -i TIGRFAM -t out.tre -o TIGRFAM_predicted.tsv.gz -p 15
hsp.py -i PHENO -t out.tre -o PHENO_predicted.tsv.gz -p 15

#>In this table the predicted copy number of all Enzyme Classification (EC) numbers is shown for each ASV. The NSTI values per ASV are not in this table since we did not specify the -n option. EC numbers are a type of gene family defined based on the chemical reactions they catalyze. For instance, EC:1.1.1.1 corresponds to alcohol dehydrogenase. In this tutorial we are focusing on EC numbers since they can be used to infer MetaCyc pathway levels (see below).

zless -S EC_predicted.tsv.gz
sequence        EC:1.1.1.1      EC:1.1.1.10     EC:1.1.1.100    ...
20e568023c10eaac834f1c110aacea18        2       0       3    ...
23fe12a325dfefcdb23447f43b6b896e        0       0       1    ...
288c8176059111c4c7fdfb0cd5afce64        1       0       1    ...
...

##Why import the tsv file to MyData?
#MyData <- read.csv(file="./COG_predicted.tsv", header=TRUE, sep="\t", row.names=1)   #6806 4598  e.g. COG5665
#MyData <- read.csv(file="./PFAM_predicted.tsv", header=TRUE, sep="\t", row.names=1)  #6806 11089 e.g. PF17225
#MyData <- read.csv(file="./KO_predicted.tsv", header=TRUE, sep="\t", row.names=1)    #6806 10543 e.g. K19791
#MyData <- read.csv(file="./EC_predicted.tsv", header=TRUE, sep="\t", row.names=1)    #6806 2913  e.g. EC.6.6.1.2
#MyData <- read.csv(file="./16S_predicted.tsv", header=TRUE, sep="\t", row.names=1)   #6806    1     e.g. X16S_rRNA_Count
#MyData <- read.csv(file="./TIGRFAM_predicted.tsv", header=TRUE, sep="\t", row.names=1)  #6806 4287  e.g. TIGR04571
#MyData <- read.csv(file="./PHENO_predicted.tsv", header=TRUE, sep="\t", row.names=1)    #6806   41  e.g. Use_of_nitrate_as_electron_acceptor, Xylose_utilizing

The metagenome_pipeline.py commands perform metagenomic prediction for several functional categories. Predicted gene families weighted by the relative abundance of ASVs in their community. In other words, we are interested in inferring the metagenomes of the communities.

#Generate metagenome predictions using EC numbers https://en.wikipedia.org/wiki/List_of_enzymes#Category:EC_1.1_(act_on_the_CH-OH_group_of_donors)
metagenome_pipeline.py -i ../core_diversity_e42369/table_even42369.biom -m 16S_predicted_and_nsti.tsv.gz -f COG_predicted.tsv.gz -o COG_metagenome_out --strat_out
metagenome_pipeline.py -i ../core_diversity_e42369/table_even42369.biom -m 16S_predicted_and_nsti.tsv.gz -f EC_predicted.tsv.gz -o EC_metagenome_out --strat_out
metagenome_pipeline.py -i ../core_diversity_e42369/table_even42369.biom -m 16S_predicted_and_nsti.tsv.gz -f KO_predicted.tsv.gz -o KO_metagenome_out --strat_out
metagenome_pipeline.py -i ../core_diversity_e42369/table_even42369.biom -m 16S_predicted_and_nsti.tsv.gz -f PFAM_predicted.tsv.gz -o PFAM_metagenome_out --strat_out
metagenome_pipeline.py -i ../core_diversity_e42369/table_even42369.biom -m 16S_predicted_and_nsti.tsv.gz -f TIGRFAM_predicted.tsv.gz -o TIGRFAM_metagenome_out --strat_out

Pathway-level inference: By default this script infers MetaCyc pathway abundances based on EC number abundances, although different gene families and pathways can also be optionally specified. This script performs a number of steps by default, which are based on the approach implemented in HUMAnN2:

#- Regroups EC numbers to MetaCyc reactions.
#- Infers which MetaCyc pathways are present based on these reactions with MinPath.
#- Calculates and returns the abundance of pathways identified as present.

pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_contrib.tsv.gz -o pathways_out -p 15

#Note that the path of map files is under /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles
pathway_pipeline.py -i COG_metagenome_out/pred_metagenome_contrib.tsv.gz -o KEGG_pathways_out -p 15 --no_regroup --map /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv

#Mapping predicted KO abundances to legacy KEGG pathways (with stratified output that represents contributions to community-wide abundances):
pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_strat.tsv.gz -o KEGG_pathways_out --no_regroup --map /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv

#Map EC numbers to MetaCyc pathways and get stratified output corresponding to contribution of predicted gene family abundances within each predicted genome:
pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -o pathways_out_per_seq --per_sequence_contrib --per_sequence_abun EC_metagenome_out/seqtab_norm.tsv.gz --per_sequence_function EC_predicted.tsv.gz

Add functional descriptions: Finally, it can be useful to have a description of each functional id in the output abundance tables. The below commands will add these descriptions as new column in gene family and pathway abundance tables

#--6.1. Add descriptions in gene family tables
add_descriptions.py -i COG_metagenome_out/pred_metagenome_unstrat.tsv.gz -m COG -o COG_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC -o EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO -o KO_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz   # EC and METACYC is a pair, EC for gene_annotation and METACYC for pathway_annotation
add_descriptions.py -i PFAM_metagenome_out/pred_metagenome_unstrat.tsv.gz -m PFAM -o PFAM_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i TIGRFAM_metagenome_out/pred_metagenome_unstrat.tsv.gz -m TIGRFAM -o TIGRFAM_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz

#--6.2. Add descriptions in pathway abundance tables
add_descriptions.py -i pathways_out/path_abun_unstrat.tsv.gz -m METACYC -o pathways_out/path_abun_unstrat_descrip.tsv.gz
gunzip path_abun_unstrat_descrip.tsv.gz

#Error - no rows remain after regrouping input table. The default pathway and regroup mapfiles are meant for EC numbers. Note that KEGG pathways are not supported since KEGG is a closed-source database, but you can input custom pathway mapfiles if you have access. If you are using a custom function database did you mean to set the --no-regroup flag and/or change the default pathways mapfile used?
#If ERROR --> USE the METACYC for downstream analyses!!!

add_descriptions.py -i pathways_out/path_abun_unstrat.tsv.gz -o KEGG_pathways_out/path_abun_unstrat_descrip.tsv.gz --custom_map_table /home/jhuang/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_pathways_info.tsv.gz

Visualization

#7.1 STAMP
#https://github.com/picrust/picrust2/wiki/STAMP-example
conda deactivate
conda install -c bioconda stamp

#conda install -c bioconda stamp
#sudo pip install pyqi
#sudo apt-get install libblas-dev liblapack-dev gfortran
#sudo apt-get install freetype* python-pip python-dev python-numpy python-scipy python-matplotlib
#sudo pip install STAMP
#conda install -c bioconda stamp

conda create -n stamp -c bioconda/label/cf201901 stamp
brew install pyqt

#DEBUG the environment
conda install pyqt=4
#conda install icu=56

e.g. path_abun_unstrat_descrip.tsv.gz and metadata.tsv from the tutorial)
cut -d$'\t' -f1 map_corrected.txt > 1
cut -d$'\t' -f5 map_corrected.txt > 5
cut -d$'\t' -f6 map_corrected.txt > 6
paste -d$'\t' 1 5 > 1_5
paste -d$'\t' 1_5 6 > metadata.tsv
#SampleID --> SampleID
SampleID    Facility    Genotype
100CHE6KO   PaloAlto    KO
101CHE6WT   PaloAlto    WT

#7.2. ALDEx2
https://bioconductor.org/packages/release/bioc/html/ALDEx2.html

Viral genome assembly and recombination analysis for Data_Sophie_HDV_Sequences

Leave a reply

Prepare input raw data

/mnt/md1/DATA/Data_Sophie_HDV_Sequences/raw_data
for f in *_R[12]_001.fastq.gz; do newname="$(echo "$f" | awk -F_ '{print $1 "_" $4 ".fastq.gz"}')"; echo mv "$f" "$newname"; done
for f in *_R[12]_001.fastq.gz; do newname="$(echo "$f" | awk -F_ '{print $1 "_" $4 ".fastq.gz"}')"; mv "$f" "$newname"; done

Call variant calling using snippy

ln -s ~/Tools/bacto/db/ .;
ln -s ~/Tools/bacto/envs/ .;
ln -s ~/Tools/bacto/local/ .;
cp ~/Tools/bacto/Snakefile .;
cp ~/Tools/bacto/bacto-0.1.json .;
cp ~/Tools/bacto/cluster.json .;
#download CU459141.gb from GenBank
mv ~/Downloads/sequence\(2\).gb db/NC_001653.gb
#setting the following in bacto-0.1.json
    "fastqc": false,
    "taxonomic_classifier": false,
    "assembly": true,
    "typing_ariba": false,
    "typing_mlst": true,
    "pangenome": true,
    "variants_calling": true,
    "phylogeny_fasttree": true,
    "phylogeny_raxml": true,
    "recombination": false, (due to gubbins-error set false)
    "genus": "Alphacoronavirus",
    "kingdom": "Viruses",
    "species": "Human coronavirus 229E",
    "mykrobe": {
        "species": "corona"
    },
    "reference": "db/PP810610.gb"
mamba activate /home/jhuang/miniconda3/envs/bengal3_ac3
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds

Prepare virus database

# ---- Date is 16.06.2025. ----
#Taxonomy ID: 12475
esearch -db nucleotide -query "txid12475[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_12475_ncbi.fasta
python ~/Scripts/filter_fasta.py genome_12475_ncbi.fasta complete_genome_12475_ncbi.fasta  #4208-->760

#https://de.wikipedia.org/wiki/Hepatitis-D-Virus
Hepatitis delta virus, complete genome
NCBI Reference Sequence: NC_001653.2

(Deprecated) Calling intra-host variants using viral-ngs

#How to run and debug the viral-ngs docker?

mkdir viralngs; cd viralngs
ln -s ~/Tools/viral-ngs_docker/Snakefile Snakefile
ln -s  ~/Tools/viral-ngs_docker/bin bin
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/refsel.acids refsel.acids
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/lastal.acids lastal.acids
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/config.yaml config.yaml
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-runs.txt samples-runs.txt
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-depletion.txt samples-depletion.txt
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-metagenomics.txt samples-metagenomics.txt
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-assembly.txt samples-assembly.txt
cp  ~/DATA_D/Data_Pietschmann_229ECoronavirus_Mutations_2024/samples-assembly-failures.txt samples-assembly-failures.txt

# Adapt the sample-*.txt
mkdir viralngs/data
mkdir viralngs/data/00_raw

mkdir bams
ref_fa="NC_001653.fasta";

for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bwa index ${ref_fa}; \
    bwa mem -M -t 16 ${ref_fa} trimmed/${sample}_trimmed_P_1.fastq trimmed/${sample}_trimmed_P_2.fastq | samtools view -bS - > bams/${sample}_genome_alignment.bam; \
done
conda activate viral-ngs4

for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    picard AddOrReplaceReadGroups I=bams/${sample}_genome_alignment.bam O=~/DATA/Data_Sophie_HDV_Sequences/viralngs/data/00_raw/${sample}.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=$sample RGSM=$sample RGLB=standard RGPU=$sample VALIDATION_STRINGENCY=LENIENT; \
done
conda deactivate

# Activate the docker viralngs environment
docker run -it --rm -v /mnt/md1/DATA/Data_Sophie_HDV_Sequences/viralngs:/work -v /home/jhuang/Tools/viral-ngs_docker:/home/jhuang/Tools/viral-ngs_docker -v /home/jhuang/REFs:/home/jhuang/REFs -v /home/jhuang/Tools/GenomeAnalysisTK-3.6:/home/jhuang/Tools/GenomeAnalysisTK-3.6 -v /home/jhuang/Tools/novocraft_v3:/home/jhuang/Tools/novocraft_v3 -v /usr/local/bin/gatk:/usr/local/bin/gatk   own_viral_ngs_gap2seq bash
cd /work

# -- ! Firstly manully run for generating all files ${sample}.cleaned.bam and ${sample}.taxfilt.bam in 01_cleaned and 01_per_sample
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do

    # -- generating data/01_cleaned/${sample}.cleaned.bam --
    bin/taxon_filter.py deplete data/00_raw/${sample}.bam tmp/01_cleaned/${sample}.raw.bam tmp/01_cleaned/${sample}.bmtagger_depleted.bam tmp/01_cleaned/${sample}.rmdup.bam data/01_cleaned/${sample}.cleaned.bam --bmtaggerDbs /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/metagenomics_contaminants_v3 /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/GRCh37.68_ncRNA-GRCh37.68_transcripts-HS_rRNA_mitRNA /home/jhuang/REFs/viral_ngs_dbs/bmtagger_dbs_remove/hg19 --blastDbs /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/hybsel_probe_adapters /home/jhuang/REFs/viral_ngs_dbs/blast_dbs_remove/metag_v3.ncRNA.mRNA.mitRNA.consensus --threads 60 --srprismMemory 14250 --JVMmemory 50g

    # -- data/01_cleaned/073.cleaned.bam --> data/01_cleaned/073.taxfilt.bam --
    bin/taxon_filter.py filter_lastal_bam data/01_cleaned/${sample}.cleaned.bam lastal_db/lastal.fasta data/01_cleaned/${sample}.taxfilt.bam
    bin/read_utils.py bwamem_idxstats data/01_cleaned/${sample}.cleaned.bam /home/jhuang/REFs/viral_ngs_dbs/spikeins/ercc_spike-ins.fasta --outStats reports/spike_count/${sample}.spike_count.txt --minScoreToFilter 60

    fastqc -f bam data/01_cleaned/${sample}.cleaned.bam -o reports/fastqc/${sample}
    unzip reports/fastqc/${sample}/${sample}.cleaned_fastqc.zip -d reports/fastqc/${sample}
    fastqc -f bam data/01_cleaned/${sample}.taxfilt.bam -o reports/fastqc/${sample}
    unzip reports/fastqc/${sample}/${sample}.taxfilt_fastqc.zip -d reports/fastqc/${sample}

    # -- data/01_cleaned/${sample}.cleaned.bam --> data/01_per_sample/${sample}.cleaned.bam --
    bin/read_utils.py merge_bams data/01_cleaned/${sample}.cleaned.bam tmp/01_cleaned/${sample}.cleaned.bam --picardOptions SORT_ORDER=queryname
    bin/read_utils.py rmdup_mvicuna_bam tmp/01_cleaned/${sample}.cleaned.bam data/01_per_sample/${sample}.cleaned.bam --JVMmemory 30g

    # -- data/01_cleaned/${sample}.taxfilt.bam --> data/01_per_sample/${sample}.taxfilt.bam --
    bin/read_utils.py merge_bams data/01_cleaned/${sample}.taxfilt.bam tmp/01_cleaned/${sample}.taxfilt.bam --picardOptions SORT_ORDER=queryname
    bin/read_utils.py rmdup_mvicuna_bam tmp/01_cleaned/${sample}.taxfilt.bam data/01_per_sample/${sample}.taxfilt.bam --JVMmemory 30g
done

# -- ! Secondly --
#If direct use         snakemake --directory /work --printshellcmds --cores 40, has the following error, using bash commands instead.
#Error in rule orient_and_impute:
#jobid: 0
#output: tmp/02_assembly/HE290.assembly3-modify.fasta
#DEBUG: --memLimitGb 12 --> --memLimitGb 960, if threads=60: 256M / 58G, how big the memory needed when threads=120: 492M / 468G.
##ASSEMBLY_1_SPADES: data/01_per_sample/010.taxfilt.bam ----> 010.assembly1-spades.fasta in tmp/02_assembly/

for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/assembly.py assemble_spades data/01_per_sample/${sample}.taxfilt.bam /home/jhuang/REFs/viral_ngs_dbs/trim_clip/contaminants.fasta tmp/02_assembly/${sample}.assembly1-spades.fasta --nReads 10000000 --threads 120 --memLimitGb 960
done

#ASSEMBLY_2_SCAFFOLDED: 010.assembly1-spades.fasta ----> 010.assembly2-scaffolded[_ref].fasta + 010.assembly2-alternate_sequences.fasta
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/assembly.py order_and_orient tmp/02_assembly/${sample}.assembly1-spades.fasta refsel_db/refsel.fasta tmp/02_assembly/${sample}.assembly2-scaffolded.fasta --min_pct_contig_aligned 0.05 --outAlternateContigs tmp/02_assembly/${sample}.assembly2-alternate_sequences.fasta --nGenomeSegments 1 --outReference tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta --threads 60
done

#DEBUG: the tool gap2seq is missing, installing package gap2seq to root@bcd2c36b083c:/opt/miniconda/envs/viral-ngs-env/bin
#
#        #https://www.cs.helsinki.fi/u/lmsalmel/Gap2Seq/
#        apt-get update
#        apt-get install -y cmake
#        mkdir build;  cd build;  cmake ..;  make
#
#        #cp /work/Gap2Seq-2.1/build/Gap2Seq /opt/miniconda/envs/viral-ngs-env/bin/gap2seq
#        cp /work/Gap2Seq-2.1/build/Gap2Seq.sh /opt/miniconda/envs/viral-ngs-env/bin/
#        cp /work/Gap2Seq-2.1/build/Gap2Seq /opt/miniconda/envs/viral-ngs-env/bin/
#        cp /work/Gap2Seq-2.1/build/GapCutter /opt/miniconda/envs/viral-ngs-env/bin/
#        cp /work/Gap2Seq-2.1/build/GapMerger /opt/miniconda/envs/viral-ngs-env/bin/
#        cp -r /work/Gap2Seq-2.1/build/ext /opt/miniconda/envs/viral-ngs-env/bin/
#        Gap2Seq.sh --help
#
#        #MOFIFIED1 in bin/tools/gap2seq.py
#        #TOOL_VERSION = '2.1'
#        TOOL_VERSION = '3.1.1a2'
#
#        root@544789adb8b6:/work# for sample in 010; do             bin/assembly.py gapfill_gap2seq tmp/02_assembly/${sample}.assembly2-scaffolded.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly2-gapfilled.fasta --memLimitGb 960 --maskErrors --randomSeed 0 --loglevel DEBUG;         done
#        2025-06-20 11:12:41,165 - gap2seq:44:execute - DEBUG - running gap2seq: /opt/miniconda/envs/viral-ngs-env/bin/Gap2Seq.sh -scaffolds /work/tmp/02_assembly/010.assembly2-scaffolded.fasta -filled /tmp/tmp-assembly-gapfill_gap2seq-vz_n3tkp/tmpkt8508du_gap2seq_dir/gap2seq-filled.s3.k90.fasta -reads /tmp/tmp-assembly-gapfill_gap2seq-vz_n3tkp/tmpfj62s6n5.1.fq,/tmp/tmp-assembly-gapfill_gap2seq-vz_n3tkp/tmpu5eu1n1r.2.fq -all-upper -verbose -solid 3 -k 90 -nb-cores 0 -max-mem 960 -randseed 0
#        /opt/miniconda/envs/viral-ngs-env/bin/Gap2Seq.sh: Unrecognized option -randseed
#
#        #MODIFIED2 in bin/tools/gap2seq.py: delete 'randseed=random_seed' in solid=solid_kmer_threshold, k=kmer_size, nb_cores=threads, max_mem=mem_limit_gb, randseed=random_seed) so that solid=solid_kmer_threshold, k=kmer_size, nb_cores=threads, max_mem=mem_limit_gb)
#
#        docker commit 3f9f9507ab31 viral_ngs_with_gap2seq
#        docker image ls or docker images
#
#        #NOTE that the image cannot be deleted, since linke to other images!
#        docker rmi own_viral_ngs_with_gap2seq
#        #Error response from daemon: conflict: unable to remove repository reference "own_viral_ngs_gap2seq" (must force) - container 3f9f9507ab31 is using its referenced image 7ffc275c57cc

#NOTE: --memLimitGb 12 --> --memLimitGb 960
#ASSEMBLY_2_GAPFILLED: 010.assembly2-scaffolded.fasta + data/01_per_sample/010.cleaned.bam ----> 010.assembly2-gapfilled.fasta
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/assembly.py gapfill_gap2seq tmp/02_assembly/${sample}.assembly2-scaffolded.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly2-gapfilled.fasta --memLimitGb 960 --maskErrors --randomSeed 0 --loglevel DEBUG
done

#ASSEMBLY_3_MOFIFY: 010.assembly2-gapfilled.fasta + 010.assembly2-scaffold_ref.fasta ----> 010.assembly3-modify.[fasta|fasta.fai|dict|nix]
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/assembly.py impute_from_reference tmp/02_assembly/${sample}.assembly2-gapfilled.fasta tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta tmp/02_assembly/${sample}.assembly3-modify.fasta --newName ${sample} --replaceLength 55 --minLengthFraction 0.05 --minUnambig 0.05 --index
done

#ASSEMBLY_4_REFINED: 010.assembly3-modify.fasta + data/01_per_sample/010.cleaned.bam ----> 010.assembly4-refined.[fasta|fasta.fai|dict|nix] + 010.assembly3.[vcf.gz|vcf.gz.tbi]
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/assembly.py refine_assembly tmp/02_assembly/${sample}.assembly3-modify.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly4-refined.fasta --outVcf tmp/02_assembly/${sample}.assembly3.vcf.gz --min_coverage 2 --novo_params '-r Random -l 20 -g 40 -x 20 -t 502' --threads 60
done

#ASSEMBLY_5_REFINED2_GENERATE_ASSEMBLY_IN_DATA_DIR: tmp/02_assembly/010.assembly4-refined.fasta + data/01_per_sample/010.cleaned.bam ----> data/02_assembly/010.[fasta|fasta.fai|dict|nix] + tmp/02_assembly/010.assembly4.[vcf.gz|vcf.gz.tbi]
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/assembly.py refine_assembly tmp/02_assembly/${sample}.assembly4-refined.fasta data/01_per_sample/${sample}.cleaned.bam data/02_assembly/${sample}.fasta --outVcf tmp/02_assembly/${sample}.assembly4.vcf.gz --min_coverage 3 --novo_params '-r Random -l 20 -g 40 -x 20 -t 100' --threads 60
done

#ALIGN_CLEANED_BAM_GENERATE_MAPPED_BAM: data/02_assembly/010.fasta + data/01_per_sample/010.cleaned.bam ----> data/02_align_to_self/010.bam + data/02_align_to_self/010.mapped.bam
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    bin/read_utils.py align_and_fix data/01_per_sample/${sample}.cleaned.bam data/02_assembly/${sample}.fasta --outBamAll data/02_align_to_self/${sample}.bam --outBamFiltered data/02_align_to_self/${sample}.mapped.bam --aligner novoalign --aligner_options '-r Random -l 20 -g 40 -x 20 -t 100 -k' --threads 60
done

# -- ! Thirdly set the samples-assembly.txt full
snakemake --directory /work --printshellcmds --cores 40
#Error in rule orient_and_impute:
#jobid: 0
#output: tmp/02_assembly/HE290.assembly3-modify.fast

                    # # ---- The snakemake pipeline contains the following remaining steps ----
                    #
                    # fastqc -f bam data/02_align_to_self/093.bam -o reports/fastqc/093
                    # unzip reports/fastqc/093/093_fastqc.zip -d reports/fastqc/093
                    # fastqc -f bam data/01_cleaned/093.cleaned.bam -o reports/fastqc/093
                    # unzip reports/fastqc/093/093.cleaned_fastqc.zip -d reports/fastqc/093
                    # fastqc -f bam data/01_cleaned/093.taxfilt.bam -o reports/fastqc/093
                    # unzip reports/fastqc/093/093.taxfilt_fastqc.zip -d reports/fastqc/093
                    #
                    # bin/intrahost.py vphaser_one_sample data/02_align_to_self/093.mapped.bam data/02_assembly/093.fasta data/04_intrahost/vphaser2.093.txt.gz --vphaserNumThreads 15 --removeDoublyMappedReads --minReadsEach 5 --maxBias 10
                    # bin/reports.py consolidate_fastqc reports/fastqc/093/taxfilt reports/summary.fastqc.taxfilt.txt
                    # bin/reports.py consolidate_fastqc reports/fastqc/093/align_to_self reports/summary.fastqc.align_to_self.txt
                    # bin/reports.py consolidate_fastqc reports/fastqc/093/cleaned reports/summary.fastqc.cleaned.txt
                    # bin/interhost.py multichr_mafft ref_genome/reference.fasta data/02_assembly/093.fasta data/03_multialign_to_ref --ep 0.123 --maxiters 1000 --preservecase --localpair --outFilePrefix aligned --sampleNameListFile data/03_multialign_to_ref/sampleNameList.txt --threads 60
                    # bin/intrahost.py merge_to_vcf ref_genome/reference.fasta data/04_intrahost/isnvs.vcf.gz --samples 093 --isnvs data/04_intrahost/vphaser2.093.txt.gz --alignments data/03_multialign_to_ref/aligned_1.fasta --strip_chr_version --parse_accession
                    # bin/interhost.py snpEff data/04_intrahost/isnvs.vcf.gz NC_001653.2 data/04_intrahost/isnvs.annot.vcf.gz j.huang@uke.de
                    # bin/intrahost.py iSNV_table data/04_intrahost/isnvs.annot.vcf.gz data/04_intrahost/isnvs.annot.txt.gz
                    # bin/reports.py consolidate_spike_count reports/spike_count reports/summary.spike_count.txt

vrap-calling

ln -s /home/jhuang/Tools/vrap/ .
mamba activate /home/jhuang/miniconda3/envs/vrap

for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
        vrap/vrap.py  -1 trimmed/${sample}_trimmed_P_1.fastq -2 trimmed/${sample}_trimmed_P_2.fastq -o vrap_${sample}  --bt2idx=/home/jhuang/REFs/genome  --host=/home/jhuang/REFs/genome.fa --virus=/mnt/md1/DATA/Data_Sophie_HDV_Sequences/complete_genome_12475_ncbi.fasta --nt=/mnt/nvme0n1p1/blast/nt --nr=/mnt/nvme0n1p1/blast/nr  -t 100 -l 200  -g
done

(Deprecated) Using docker viral-ngs scripts processing the vrap-results. Be carefual since it doesn’t release good results.

mv vrap_010 viralngs/
mkdir tmp/02_assembly data/02_assembly

#refsel_db/refsel.fasta

The longest contig genome-calling is
010: HQ005371

# Using viral-ngs to improve the assembly, the results are not so good due to the too diverser reference --> not used!
for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
for sample in 010; do
    bin/assembly.py order_and_orient vrap_${sample}/virus_user_db_contigs.fasta HQ005371.fasta  tmp/02_assembly/${sample}.assembly2-scaffolded.fasta --min_pct_contig_aligned 0.05 --outAlternateContigs tmp/02_assembly/${sample}.assembly2-alternate_sequences.fasta --nGenomeSegments 1 --outReference tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta --threads 60
    bin/assembly.py gapfill_gap2seq tmp/02_assembly/${sample}.assembly2-scaffolded.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly2-gapfilled.fasta --memLimitGb 960 --maskErrors --randomSeed 0 --loglevel DEBUG
    bin/assembly.py impute_from_reference tmp/02_assembly/${sample}.assembly2-gapfilled.fasta tmp/02_assembly/${sample}.assembly2-scaffold_ref.fasta tmp/02_assembly/${sample}.assembly3-modify.fasta --newName ${sample} --replaceLength 55 --minLengthFraction 0.05 --minUnambig 0.05 --index
    bin/assembly.py refine_assembly tmp/02_assembly/${sample}.assembly3-modify.fasta data/01_per_sample/${sample}.cleaned.bam tmp/02_assembly/${sample}.assembly4-refined.fasta --outVcf tmp/02_assembly/${sample}.assembly3.vcf.gz --min_coverage 2 --novo_params '-r Random -l 20 -g 40 -x 20 -t 502' --threads 60
    bin/assembly.py refine_assembly tmp/02_assembly/${sample}.assembly4-refined.fasta data/01_per_sample/${sample}.cleaned.bam data/02_assembly/${sample}.fasta --outVcf tmp/02_assembly/${sample}.assembly4.vcf.gz --min_coverage 3 --novo_params '-r Random -l 20 -g 40 -x 20 -t 100' --threads 60

    bin/read_utils.py align_and_fix data/01_per_sample/${sample}.cleaned.bam data/02_assembly/${sample}.fasta --outBamAll data/02_align_to_self/${sample}.bam --outBamFiltered data/02_align_to_self/${sample}.mapped.bam --aligner novoalign --aligner_options '-r Random -l 20 -g 40 -x 20 -t 100 -k' --threads 60
done

Filtering the contig from the vrap results

for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    python ~/Scripts/extract_virus_user_db_contigs.py vrap_${sample}
done

for sample in 010  048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    python ~/Scripts/extract_longest_contig.py vrap_${sample}/virus_user_db_contigs.fasta ${sample}_raw.fasta
done

Circularity checking of the contigs: method_1 using ccfind

#Install ccfind
        #Detects circular genomes via terminal redundancy using BLAST or FASTA Smith–Waterman.
        git clone https://github.com/yosuken/ccfind
        cd ccfind
        # Ensure ssearch36, blastn, prodigal are installed
        ./ccfind <input.fasta> <output_dir>

        #https://github.com/wrpearson/fasta3
        wget http://faculty.virginia.edu/wrpearson/fasta/fasta36/fasta-36.3.8h.tar.gz
        tar -xzf fasta-36.3.8h.tar.gz
        cd fasta-36.3.8h
        make -f Makefile.linux64_sse2
        After compiling, add it to your path: export PATH=$PWD:$PATH

# Using ccfind check circularity of the contigs
#docker pull sangerpathogens/circlator
#docker run -it --rm -v /home/jhuang/DATA/Data_Sophie_HDV_Sequences:/data sangerpathogens/circlator bash
#cd /data
for sample in 010 048  073  083  093  1021  104  108  129  253  282  301  357  383  405  444  446  450  494  503  69   738  81  879  94   995    HE290  HE554  HE695  HSVM 020  068  079  090  097  103   107  109  141  279  293  341  370  394  442  445  449  478  497  550  691  771  82  88   973  HE284  HE511  HE566  HE748; do
    ~/Tools/ccfind/ccfind ${sample}_raw.fasta ccfind_${sample}
    #seqtk mergepe trimmed/${sample}_trimmed_P_1.fastq trimmed/${sample}_trimmed_P_2.fastq > ${sample}_interleaved.fq
    #circlator all --threads 16 vrap_010/virus_user_db_contigs.fasta 010_interleaved.fq 010_circlator_out
done

#print file size of all files circ.noTR.fasta under "ccfind_*/result/"
find ccfind_*/result/ -name "circ.noTR.fasta" -exec ls -lh {} \; | awk '{print $5, $9}'
find ccfind_*/result/ -name "circ.noTR.fasta" -exec stat -c "%s %n" {} \;

Circularity checking of the contigs: method_2 using blastn

# Manual inspection of the mate-pair read mapping at the start and end confirmed that three of the contigs were circular plasmids.

# -- Circularity_varification: Confirm the contigs are circular using blastn --
makeblastdb -in virus_user_db_contigs.fasta -dbtype nucl -out contigs_db
blastn -query virus_user_db_contigs.fasta -db contigs_db -outfmt 6 -evalue 1e-10 > blast_results.txt

    CAP_1_length_1851       CAP_1_length_1851       100.000 166     0       0       1686    1851    1       166     4.14e-86        307
    CAP_1_length_1851       CAP_1_length_1851       100.000 166     0       0       1       166     1686    1851    4.14e-86        307

samtools faidx virus_user_db_contigs.fasta CAP_1_length_1851 > CAP_1.fasta
python3 ~/Scripts/process_circular.py CAP_1.fasta CAP_1_circular.fasta --overlap_len 166
#Modify the sequence header to 010

Calculate coverage

bwa index CAP_1_circular.fasta
bwa mem CAP_1_circular.fasta ../trimmed/010_trimmed_P_1.fastq ../trimmed/010_trimmed_P_2.fastq > aligned_reads.sam
samtools view -bS aligned_reads.sam | samtools sort -o aligned_reads.sorted.bam
samtools index aligned_reads.sorted.bam
samtools depth aligned_reads.sorted.bam > coverage.txt
awk '{sum+=$3} END {print "Average coverage:", sum/NR}' coverage.txt
#Average coverage: 7754.49

bwa index 010.fasta
bwa mem 010.fasta trimmed/010_trimmed_P_1.fastq trimmed/010_trimmed_P_2.fastq > 010_aligned_reads.sam
samtools view -bS 010_aligned_reads.sam | samtools sort -o 010_aligned_reads.sorted.bam
samtools index 010_aligned_reads.sorted.bam

bwa index 068.fasta
bwa mem 068.fasta trimmed/068_trimmed_P_1.fastq trimmed/068_trimmed_P_2.fastq > 068_aligned_reads.sam
samtools view -bS 068_aligned_reads.sam | samtools sort -o 068_aligned_reads.sorted.bam
samtools index 068_aligned_reads.sorted.bam

Copy and update the fasta-headers

#!/bin/bash

# List of IDs to process
ids=(
010 020 048 073 079 083 093 104 253 279 282 341 383 394 405 446 449 503 550 69 738 879 88
HE290 HE511 HE554 HE695 HE748
)

for id in "${ids[@]}"; do
src="ccfind_${id}/result/circ.noTR.fasta"
dst="${id}.fasta"

if [[ -f "$src" ]]; then
    cp "$src" "$dst"

    # Update header in the copied fasta
    ruby -e "
    filename = '${id}'
    seq = ''
    File.foreach('${dst}') do |line|
        next if line.start_with?('>')
        seq += line.strip
    end
    File.open('${dst}', 'w') do |f|
        f.puts '>' + filename
        f.puts seq
    end
    "

    echo "Processed $id"
else
    echo "Warning: source file $src not found!"
fi
done

Generate coverage plot filtering some sequences, then align qualified sequences as input of RDP4

#69 88 HE290 HE511 HE554 HE695 HE748 --> 069 088 290 511 554 695 748
mv 69_my.fasta 069_my.fasta
mv 88_my.fasta 088_my.fasta
mv HE290_my.fasta 290_my.fasta
mv HE511_my.fasta 511_my.fasta
mv HE554_my.fasta 554_my.fasta
mv HE695_my.fasta 695_my.fasta
mv HE748_my.fasta 748_my.fasta
for sample in  010 020 048 073 079 083 093 104 253 279 282 341 383 394 405 446 449 503 550 738 879    069 088 290 511 554 695 748; do
mv ~/Downloads/${sample}.fasta .
done

cd trimmed_
ln -s 69_trimmed_P_1.fastq 069_trimmed_P_1.fastq
ln -s 69_trimmed_P_2.fastq 069_trimmed_P_2.fastq
ln -s 88_trimmed_P_1.fastq 088_trimmed_P_1.fastq
ln -s 88_trimmed_P_2.fastq 088_trimmed_P_2.fastq
ln -s HE290_trimmed_P_1.fastq 290_trimmed_P_1.fastq
ln -s HE290_trimmed_P_2.fastq 290_trimmed_P_2.fastq
ln -s HE511_trimmed_P_1.fastq 511_trimmed_P_1.fastq
ln -s HE511_trimmed_P_2.fastq 511_trimmed_P_2.fastq
ln -s HE554_trimmed_P_1.fastq 554_trimmed_P_1.fastq
ln -s HE554_trimmed_P_2.fastq 554_trimmed_P_2.fastq
ln -s HE695_trimmed_P_1.fastq 695_trimmed_P_1.fastq
ln -s HE695_trimmed_P_2.fastq 695_trimmed_P_2.fastq
ln -s HE748_trimmed_P_1.fastq 748_trimmed_P_1.fastq
ln -s HE748_trimmed_P_2.fastq 748_trimmed_P_2.fastq

ln -s 81_trimmed_P_1.fastq 081_trimmed_P_1.fastq
ln -s 81_trimmed_P_2.fastq 081_trimmed_P_2.fastq
ln -s 82_trimmed_P_1.fastq 082_trimmed_P_1.fastq
ln -s 82_trimmed_P_2.fastq 082_trimmed_P_2.fastq
ln -s 94_trimmed_P_1.fastq 094_trimmed_P_1.fastq
ln -s 94_trimmed_P_2.fastq 094_trimmed_P_2.fastq
ln -s HE284_trimmed_P_1.fastq 284_trimmed_P_1.fastq
ln -s HE284_trimmed_P_2.fastq 284_trimmed_P_2.fastq
ln -s HE566_trimmed_P_1.fastq 566_trimmed_P_1.fastq
ln -s HE566_trimmed_P_2.fastq 566_trimmed_P_2.fastq

#update_file_header.sh
update_fasta.py  # generate the HDV_genomes_
conda activate plot-numpy1
for sample in  010 020 048 073 079 083 093 104 253 279 282 341 383 394 405 446 449 503 550 738 879  069 088 290 511 554 695 748    068 081 082 090 094 097 103 107 108 109 129 141 284 357 370 442 444 445 497 566 691 771 973 995 1021    293 450 478 494; do
        bwa index HDV_genomes_/${sample}.fasta
        bwa mem HDV_genomes_/${sample}.fasta trimmed_/${sample}_trimmed_P_1.fastq trimmed_/${sample}_trimmed_P_2.fastq > ${sample}_aligned_reads.sam
        samtools view -bS ${sample}_aligned_reads.sam | samtools sort -o ${sample}_aligned_reads.sorted.bam
        samtools index ${sample}_aligned_reads.sorted.bam
        samtools depth -m 0 -a ${sample}_aligned_reads.sorted.bam > ${sample}_coverage.txt
        python ~/Scripts/plot_coverage.py ${sample}_coverage.txt ${sample}_cov.png
done

for sample in  010 020 048 073 079 083 093 104 253 279 282 341 383 394 405 446 449 503 550 738 879  069 088 290 511 554 695 748; do
        bwa index HDV_genomes_/${sample}_my.fasta
        bwa mem HDV_genomes_/${sample}_my.fasta trimmed_/${sample}_trimmed_P_1.fastq trimmed_/${sample}_trimmed_P_2.fastq > ${sample}_my_aligned_reads.sam
        samtools view -bS ${sample}_my_aligned_reads.sam | samtools sort -o ${sample}_my_aligned_reads.sorted.bam
        samtools index ${sample}_my_aligned_reads.sorted.bam
        samtools depth -m 0 -a ${sample}_my_aligned_reads.sorted.bam > ${sample}_my_coverage.txt
        python ~/Scripts/plot_coverage.py ${sample}_my_coverage.txt ${sample}_my_cov.png
done

#-- Note that The following two assembly were not sent to me due to the bad coverage --
#301_coverage.txt
#HSVM_coverage.txt

for file in *_cov.png; do
    convert "$file" "${file%.png}.pdf"
done
pdftk 010_cov.pdf 010_my_cov.pdf 020_cov.pdf 020_my_cov.pdf 048_cov.pdf 048_my_cov.pdf 073_cov.pdf 073_my_cov.pdf 079_cov.pdf 079_my_cov.pdf 083_cov.pdf 083_my_cov.pdf 093_cov.pdf 093_my_cov.pdf 104_cov.pdf 104_my_cov.pdf 253_cov.pdf 253_my_cov.pdf 279_cov.pdf 279_my_cov.pdf 282_cov.pdf 282_my_cov.pdf 341_cov.pdf 341_my_cov.pdf 383_cov.pdf 383_my_cov.pdf 394_cov.pdf 394_my_cov.pdf 405_cov.pdf 405_my_cov.pdf 446_cov.pdf 446_my_cov.pdf 449_cov.pdf 449_my_cov.pdf 503_cov.pdf 503_my_cov.pdf 550_cov.pdf 550_my_cov.pdf 738_cov.pdf 738_my_cov.pdf 879_cov.pdf 879_my_cov.pdf    069_cov.pdf 069_my_cov.pdf 088_cov.pdf 088_my_cov.pdf 290_cov.pdf 290_my_cov.pdf 511_cov.pdf 511_my_cov.pdf 554_cov.pdf 554_my_cov.pdf 695_cov.pdf 695_my_cov.pdf 748_cov.pdf 748_my_cov.pdf      068_cov.pdf 081_cov.pdf 082_cov.pdf 090_cov.pdf 094_cov.pdf 097_cov.pdf 103_cov.pdf 107_cov.pdf 108_cov.pdf 109_cov.pdf 129_cov.pdf 141_cov.pdf 284_cov.pdf 357_cov.pdf 370_cov.pdf 442_cov.pdf 444_cov.pdf 445_cov.pdf 497_cov.pdf 566_cov.pdf 691_cov.pdf 771_cov.pdf 973_cov.pdf 995_cov.pdf 1021_cov.pdf    293_cov.pdf 450_cov.pdf 478_cov.pdf 494_cov.pdf cat output coverges_all.pdf

pdftk 010_cov.pdf 020_cov.pdf 048_cov.pdf 073_cov.pdf 079_cov.pdf 083_cov.pdf 093_cov.pdf 104_cov.pdf 253_cov.pdf 279_cov.pdf 282_cov.pdf 341_cov.pdf 383_cov.pdf 394_cov.pdf 405_cov.pdf 446_cov.pdf 449_cov.pdf 503_cov.pdf 550_cov.pdf 738_cov.pdf 879_cov.pdf    069_cov.pdf 088_cov.pdf 290_cov.pdf 511_cov.pdf 554_cov.pdf 695_cov.pdf 748_cov.pdf      068_cov.pdf 081_cov.pdf 082_cov.pdf 090_cov.pdf 094_cov.pdf 097_cov.pdf 103_cov.pdf 107_cov.pdf 108_cov.pdf 109_cov.pdf 129_cov.pdf 141_cov.pdf 284_cov.pdf 357_cov.pdf 370_cov.pdf 442_cov.pdf 444_cov.pdf 445_cov.pdf 497_cov.pdf 566_cov.pdf 691_cov.pdf 771_cov.pdf 973_cov.pdf 995_cov.pdf 1021_cov.pdf    293_cov.pdf 450_cov.pdf 478_cov.pdf 494_cov.pdf cat output coverages.pdf

#Not good quality: 082,094,129,284,566,691,293,450,478,494

cat 010.fasta 020.fasta 048.fasta 073.fasta 079.fasta 083.fasta 093.fasta 104.fasta 253.fasta 279.fasta 282.fasta 341.fasta 383.fasta 394.fasta 405.fasta 446.fasta 449.fasta 503.fasta 550.fasta 738.fasta 879.fasta    069.fasta 088.fasta 290.fasta 511.fasta 554.fasta 695.fasta 748.fasta      068.fasta 081.fasta   090.fasta   097.fasta 103.fasta 107.fasta 108.fasta 109.fasta   141.fasta   357.fasta 370.fasta 442.fasta 444.fasta 445.fasta 497.fasta      771.fasta 973.fasta 995.fasta 1021.fasta               > all.fasta
awk '/^>/ {print $1} !/^>/ {print}' all.fasta > all_.fasta

mafft --adjustdirection --clustalout all_.fasta > all.aln
mafft --auto all_.fasta > aligned.fasta
#iqtree -s aligned.fasta -m GTR+G -bb 1000 -nt AUTO
FastTree -gtr -gamma aligned.fasta > tree.nwk

(NOT_NEEDED) rotate (fixstart) the genomes (not needed, since the genome provided has the same starting point)
```
#python rotate_circular_genome.py ${sample}.fasta ${sample}_rotated.fasta ATGAGC
```

(TODO) Draw plotTreeHeatmap

#http://xgenes.com/article/article-content/383/presence-absence-table-and-graphics-for-selected-genes-in-data-patricia-sepi-7samples/#

Report

Several assemblies (082, 094, 129, 284, 566, 691, 293, 450, 478, 494) show poor quality. Please refer to the attached coverage.pdf for an overview of read mapping. I excluded these from the recombination analysis, which was carried out using RDP4, applying nine detection methods (RDP, GENECONV, Bootscan, MaxChi, Chimaera, SiScan, PhylPro, LARD, and 3Seq).
Please note that the analysis assumes accurate assemblies—misassemblies can lead to false positives.
The results, summarized in the attached Excel files, identify three sequences (503, 109, 394) as potential recombinants. However, I recommend interpreting these findings with caution.
    * Events 2 and 3: Breakpoints occur near the genome ends (1602–12 and 1451–134), where alignment artifacts are common.
    * Event 1: The recombinant region is less than 100 nucleotides, which may be below biological relevance.
    * Method flags: RDP flags Events 1 and 2 (~) as possibly caused by other evolutionary processes. All events are flagged (^) to indicate that the recombinant sequence may have been misidentified (one of the identified parents might be the recombinant).

Comprehensive smallRNA-7 profiling using exceRpt pipeline with full reference databases (v3)

Leave a reply

Input data

# name                         condition
# ----------------------------------------------
# 0403_WaGa_wt                 parental_cells_1.fastq.gz
# #0505_WaGa_wt_EV_RNA         untreated_1.fastq.gz
# #0505_WaGa_sT_DMSO_EV_RNA    DMSO_control_1.fastq.gz
# #0505_WaGa_sT_Dox_EV_RNA     sT_knockdown_1.fastq.gz
# #0505_WaGa_scr_DMSO_EV_RNA   scr_DMSO_control_1.fastq.gz
# #0505_WaGa_scr_Dox_EV_RNA    scr_control_1.fastq.gz
# #1905_WaGa_wt_EV_RNA         untreated_2.fastq.gz
# #1905_WaGa_sT_DMSO_EV_RNA    DMSO_control_2.fastq.gz
# #1905_WaGa_sT_Dox_EV_RNA     sT_knockdown_2.fastq.gz
# #1905_WaGa_scr_DMSO_EV_RNA   scr_DMSO_control_2.fastq.gz
# #1905_WaGa_scr_Dox_EV_RNA    scr_control_2.fastq.gz
#
# WaGa_wt_cells_1              parental_cells_2.fastq.gz
# WaGa_wt_cells_2              parental_cells_3.fastq.gz
# #2001_WaGa_sT_DMSO           DMSO_control_3.fastq.gz
# #2001_WaGa_sT_Dox            sT_knockdown_3.fastq.gz
# #2001_WaGa_scr_DMSO          scr_DMSO_control_3.fastq.gz
# #2001_WaGa_scr_Dox           scr_control_3.fastq.gz
#
# WaGa_wt_cells_1              parental_cells_2_R2.fastq.gz
# WaGa_wt_cells_2              parental_cells_3_R2.fastq.gz
# #2001_WaGa_sT_DMSO           DMSO_control_3_R2.fastq.gz
# #2001_WaGa_sT_Dox            sT_knockdown_3_R2.fastq.gz
# #2001_WaGa_scr_DMSO          scr_DMSO_control_3_R2.fastq.gz
# #2001_WaGa_scr_Dox           scr_control_3_R2.fastq.gz

mkdir ~/DATA/Data_Ute/Data_Ute_smallRNA_7/raw_data
cd raw_data
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_3/220617_NB501882_0371_AH7572BGXM/nf774/0403_WaGa_wt_S20_R1_001.fastq.gz parental_cells_1.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf930/01_0505_WaGa_wt_EV_RNA_S1_R1_001.fastq.gz untreated_1.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf931/02_0505_WaGa_sT_DMSO_EV_RNA_S2_R1_001.fastq.gz DMSO_control_1.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf932/03_0505_WaGa_sT_Dox_EV_RNA_S3_R1_001.fastq.gz sT_knockdown_1.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf933/04_0505_WaGa_scr_DMSO_EV_RNA_S4_R1_001.fastq.gz scr_DMSO_control_1.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf934/05_0505_WaGa_scr_Dox_EV_RNA_S5_R1_001.fastq.gz scr_control_1.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf935/06_1905_WaGa_wt_EV_RNA_S6_R1_001.fastq.gz untreated_2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf936/07_1905_WaGa_sT_DMSO_EV_RNA_S7_R1_001.fastq.gz DMSO_control_2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf937/08_1905_WaGa_sT_Dox_EV_RNA_S8_R1_001.fastq.gz sT_knockdown_2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf938/09_1905_WaGa_scr_DMSO_EV_RNA_S9_R1_001.fastq.gz scr_DMSO_control_2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf939/10_1905_WaGa_scr_Dox_EV_RNA_S10_R1_001.fastq.gz scr_control_2.fastq.gz

ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf961/WaGaWTcells_1_S1_R1_001.fastq.gz parental_cells_2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf962/WaGaWTcells_2_S2_R1_001.fastq.gz parental_cells_3.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf971/2001_WaGa_sT_DMSO_S3_R1_001.fastq.gz DMSO_control_3.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf972/2001_WaGa_sT_Dox_S4_R1_001.fastq.gz sT_knockdown_3.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf973/2001_WaGa_scr_DMSO_S5_R1_001.fastq.gz scr_DMSO_control_3.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf974/2001_WaGa_scr_Dox_S6_R1_001.fastq.gz scr_control_3.fastq.gz

ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf961/WaGaWTcells_1_S1_R2_001.fastq.gz parental_cells_2_R2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf962/WaGaWTcells_2_S2_R2_001.fastq.gz parental_cells_3_R2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf971/2001_WaGa_sT_DMSO_S3_R2_001.fastq.gz DMSO_control_3_R2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf972/2001_WaGa_sT_Dox_S4_R2_001.fastq.gz sT_knockdown_3_R2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf973/2001_WaGa_scr_DMSO_S5_R2_001.fastq.gz scr_DMSO_control_3_R2.fastq.gz
ln -s ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf974/2001_WaGa_scr_Dox_S6_R2_001.fastq.gz scr_control_3_R2.fastq.gz

#awk '{print $2}' temp3

Adapter trimming

#some common adapter sequences from different kits for reference:
#    - TruSeq Small RNA (Illumina): TGGAATTCTCGGGTGCCAAGG
#    - Small RNA Kits V1 (Illumina): TCGTATGCCGTCTTCTGCTTGT
#    - Small RNA Kits V1.5 (Illumina): ATCTCGTATGCCGTCTTCTGCTTG
#    - NEXTflex Small RNA Sequencing Kit v3 for Illumina Platforms (Bioo Scientific): TGGAATTCTCGGGTGCCAAGG
#    - LEXOGEN Small RNA-Seq Library Prep Kit (Illumina): TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC *
mkdir trimmed; cd trimmed
for sample in parental_cells_1 untreated_1 DMSO_control_1 sT_knockdown_1 scr_DMSO_control_1 scr_control_1 untreated_2 DMSO_control_2 sT_knockdown_2 scr_DMSO_control_2 scr_control_2 parental_cells_2 parental_cells_3 DMSO_control_3 sT_knockdown_3 scr_DMSO_control_3 scr_control_3 parental_cells_2_R2 parental_cells_3_R2 DMSO_control_3_R2 sT_knockdown_3_R2 scr_DMSO_control_3_R2 scr_control_3_R2; do
  echo "------------------------------------ cutadapting the ${sample} -----------------------------------" >> LOG
  cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o ${sample}.fastq.gz ../raw_data/${sample}.fastq.gz >> LOG
done

# In LOG file to look the differences of the R1 and R2 reads based on the statistics of trimming.

#Reads with adapters:                10,114,799 (79.9%)
#Reads with adapters:                   240,366 (1.9%)
#Reads with adapters:                   233,380 (1.6%)
#Reads with adapters:                   230,664 (1.3%)
#Reads with adapters:                   207,717 (1.3%)
#Reads with adapters:                   186,080 (1.2%)
#Reads with adapters:                   577,429 (1.5%)
#Reads with adapters:                   268,867 (1.7%)
#Reads with adapters:                   325,300 (1.4%)
#Reads with adapters:                   314,540 (1.5%)
#Reads with adapters:                   264,349 (1.5%)

#Reads with adapters:                   299,677 (0.7%)
#Reads with adapters:                   108,801 (0.6%)
#Reads with adapters:                     5,095 (0.0%)
#Reads with adapters:                     6,989 (0.0%)
#Reads with adapters:                     3,868 (0.0%)
#Reads with adapters:                     2,173 (0.0%)

#Reads with adapters:                   615,334 (1.4%)
#Reads with adapters:                   258,388 (1.5%)
#Reads with adapters:                   294,325 (1.4%)
#Reads with adapters:                   336,932 (1.8%)
#Reads with adapters:                   239,288 (2.0%)
#Reads with adapters:                   117,544 (1.5%)

#Alternatively, we can also cut adapter in the exceRpt built-in functions since 'grep "TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC" /mnt/nvme0n1p1/MyexceRptDatabase/adapters/adapters.fa | wc -l' results in 48 records. However, explicitly cut adapter before is more ensured.

#TODO: check if the R1 and R2 has the similar data distribution? Then decide if only R1 or both used for the downstream analysis?
cat parental_cells_2.fastq.gz parental_cells_2_R2.fastq.gz > parental_cells_2_merged.fastq.gz
cat parental_cells_3.fastq.gz parental_cells_3_R2.fastq.gz > parental_cells_3_merged.fastq.gz
cat DMSO_control_3.fastq.gz DMSO_control_3_R2.fastq.gz > DMSO_control_3_merged.fastq.gz
cat sT_knockdown_3.fastq.gz sT_knockdown_3_R2.fastq.gz > sT_knockdown_3_merged.fastq.gz
cat scr_DMSO_control_3.fastq.gz scr_DMSO_control_3_R2.fastq.gz > scr_DMSO_control_3_merged.fastq.gz
cat scr_control_3.fastq.gz scr_control_3_R2.fastq.gz > scr_control_3_merged.fastq.gz

#Scenario   Option to use
#-----------------------------
#Trimming Read 1 only   -a
#Trimming Read 2 only   -a
#Trimming paired-end together   -a and -A
#cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o ${sample}_R2_trimmed.fastq.gz ../raw_data/${sample}_R2.fastq.gz
cutadapt \
-a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC \
-A TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC \
-q 20 --minimum-length 5 --trim-n \
-o ${sample}_R1_trimmed.fastq.gz -p ${sample}_R2_trimmed.fastq.gz \
../raw_data/${sample}_R1.fastq.gz ../raw_data/${sample}_R2.fastq.gz

# -- check if it is necessary to remove adapter from 5'-end --
#(Option_1) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -o /dev/null --report=minimal 0505_WaGa_wt_cutadapted.fastq.gz --> The trimming statistics in the output will show how often 5'-end adapters were removed.
#(Option 2) zcat your_sample.fastq.gz | grep 'TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC' | head -n 20
#(Option 3) fastqc your_sample.fastq.gz
#Open the generated HTML report and check:
#    The "Overrepresented sequences" section for adapter sequences.
#    The "Per base sequence content" plot to see if there are unexpected sequences at the start of reads.
#(If check results shows both ends contain adapter) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 10 -o ${sample}_trimmed.fastq.gz ${sample}.fastq.gz >> LOG2
#    -g → Trims 5'-end adapters
#    -a → Trims 3'-end adapters; -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC → Specifies the adapter sequence to be removed from the 3' end of the reads. The sequence provided is common in RNA-seq libraries (e.g., Illumina small RNA sequencing).
#    -q 20 → Performs quality trimming at both read ends, removing bases with a Phred quality score below 20.

Install exceRpt (https://github.gersteinlab.org/exceRpt/)

docker pull rkitchen/excerpt
mkdir MyexceRptDatabase
cd /mnt/nvme0n1p1/MyexceRptDatabase
wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz
tar -xvf exceRptDB_v4_hg38_lowmem.tgz
#http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg19_lowmem.tgz
#http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz
#http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_mm10_lowmem.tgz
wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOmiRNArRNA.tgz
tar -xvf exceRptDB_v4_EXOmiRNArRNA.tgz
wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOGenomes.tgz
tar -xvf exceRptDB_v4_EXOGenomes.tgz

Run exceRpt

#[---- REAL_RUNNING_COMPLETE_DB ---->]
#NOTE that if not renamed in the input files, then have to RENAME all files recursively by removing "_cutadapted.fastq" in all names in _CORE_RESULTS_v4.6.3.tgz (first unzip, removing, then zip, mv to ../results_g).
cd trimmed
#for file in *_cutadapted.fastq.gz; do
#    echo "mv \"$file\" \"${file/_cutadapted.fastq/}\""
#done
for file in *.fastq.gz; do
    echo "mv \"$file\" \"${file/.fastq/}\""
done

mkdir results_exo6
for sample in parental_cells_2 parental_cells_3 DMSO_control_3 sT_knockdown_3 scr_DMSO_control_3 scr_control_3    parental_cells_2_R2 parental_cells_3_R2 DMSO_control_3_R2 sT_knockdown_3_R2 scr_DMSO_control_3_R2 scr_control_3_R2    parental_cells_2_merged parental_cells_3_merged DMSO_control_3_merged sT_knockdown_3_merged scr_DMSO_control_3_merged scr_control_3_merged    parental_cells_1 untreated_1 DMSO_control_1 sT_knockdown_1 scr_DMSO_control_1 scr_control_1 untreated_2 DMSO_control_2 sT_knockdown_2 scr_DMSO_control_2 scr_control_2; do
    docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \
               -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo6:/exceRptOutput \
              -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \
              -t rkitchen/excerpt \
              INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
done

#TODO: DEBUG running exceRpt within docker container
#docker run -it --rm \
#  -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \
#  -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo6:/exceRptOutput \
#  -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \
#  --entrypoint bash \
#  rkitchen/excerpt
#bash /exceRpt_bin/exceRpt_smallRNA   INPUT_FILE_PATH=/exceRptInput/sample1.fastq.gz   MAIN_ORGANISM_GENOME_ID=hg38   N_THREADS=8   JAVA_RAM='16G'   MAP_EXOGENOUS=on

#DEBUG the excerpt env
docker inspect rkitchen/excerpt:latest
# Without /bin/bash → May run and exit immediately
#docker run -it rkitchen/excerpt
# With /bin/bash → Stays open for interaction
docker run -it --entrypoint /bin/bash rkitchen/excerpt

#TODO: In the read2 exists the following adapter2, to test if the adapter can be identified and removed with the pipeline!

Processing exceRpt output from multiple samples

mkdir summaries_exo6
cd ~/DATA/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master
(r_env) jhuang@WS-2290C:~/DATA/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master$ R
#WARNING: need to reload the R-script after each change of the script.
source("mergePipelineRuns_functions.R")

getwd()
#[1] "/media/jhuang/Elements/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master"
processSamplesInDir("../results_exo6/", "../summaries_exo6")

#~/Tools/csv2xls-0.4/csv_to_xls.py exceRpt_miRNA_ReadsPerMillion.txt exceRpt_tRNA_ReadsPerMillion.txt exceRpt_piRNA_ReadsPerMillion.txt -d$'\t' -o exceRpt_results_detailed.xls

mv results_exo6 results_exo7; mkdir results_exo6; sudo mv _R2 ../results_exo6; sudo mv _merged ../results_exo6
```
mkdir summaries_exo7
processSamplesInDir("../results_exo7/", "../summaries_exo7")
```

Re-draw the heatmap plots

# -- R-code --

    # Load required library
    library(dplyr)

    # Original vectors
    samples_orig <- c("untreated_2", "parental_cells_1", "parental_cells_2", "parental_cells_3", "scr_control_3",
                    "DMSO_control_3", "scr_DMSO_control_3", "sT_knockdown_3", "untreated_1", "DMSO_control_1",
                    "scr_control_1", "scr_DMSO_control_1", "DMSO_control_2", "sT_knockdown_2", "scr_control_2",
                    "scr_DMSO_control_2", "sT_knockdown_1")

    categories_orig <- c("reads_used_for_alignment", "genome", "miRNA_sense", "miRNA_antisense",
                        "miRNAprecursor_sense", "miRNAprecursor_antisense", "tRNA_sense", "tRNA_antisense",
                        "piRNA_sense", "piRNA_antisense", "gencode_sense", "gencode_antisense",
                        "circularRNA_sense", "circularRNA_antisense", "not_mapped_to_genome_or_libs",
                        "repetitiveElements", "endogenous_gapped", "exogenous_miRNA", "exogenous_rRNA",
                        "exogenous_genomes")

    # Provided samples and categories (desired order and format)
    samples <- c("parental_cells_1","parental_cells_2","parental_cells_3",
                "untreated_1","untreated_2",
                "scr_control_1","scr_control_2","scr_control_3",
                "DMSO_control_1","DMSO_control_2","DMSO_control_3",
                "scr_DMSO_control_1","scr_DMSO_control_2","scr_DMSO_control_3",
                "sT_knockdown_1","sT_knockdown_2","sT_knockdown_3")

    categories <- c("reads_used_for_alignment", "genome", "miRNA", "miRNAprecursor", "tRNA", "piRNA",
                    "gencode", "circularRNA", "not_mapped_to_genome_or_libs", "repetitiveElements",
                    "endogenous_gapped", "exogenous_miRNA", "exogenous_rRNA", "exogenous_genomes")

    # Original data matrix
    data_orig <- matrix(c(
                    100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0,
                    21.3, 97.4, 99.0, 99.0, 89.2, 91.9, 90.6, 91.0, 44.9, 65.6, 69.2, 73.3, 71.9, 81.4, 78.3, 79.3, 78.5,
                    3.5, 3.7, 88.7, 86.6, 70.9, 81.1, 77.9, 79.3, 7.1, 12.9, 7.0, 7.5, 14.6, 16.2, 14.7, 15.3, 15.8,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    0.0, 0.2, 0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.1, 0.1, 0.0,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    8.4, 0.5, 2.9, 3.0, 1.7, 1.3, 1.2, 1.4, 25.3, 41.2, 49.0, 52.1, 33.9, 45.3, 41.4, 47.3, 48.8,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    0.1, 0.0, 0.4, 0.5, 0.9, 1.6, 1.1, 1.4, 0.4, 0.4, 0.5, 0.4, 0.6, 0.3, 0.4, 0.4, 0.5,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    6.7, 86.0, 5.3, 6.9, 7.9, 4.6, 5.5, 4.9, 8.6, 8.5, 10.8, 11.2, 18.3, 15.7, 16.6, 12.9, 10.8,
                    0.7, 0.1, 0.2, 0.2, 0.5, 0.2, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.3, 0.2, 0.3, 0.2, 0.2,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    78.7, 2.6, 1.0, 1.0, 10.8, 8.1, 9.4, 9.0, 55.1, 34.4, 30.8, 26.7, 28.1, 18.6, 21.7, 20.7, 21.5,
                    0.1, 0.0, 0.0, 0.0, 0.2, 0.1, 0.1, 0.2, 0.3, 0.3, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1,
                    0.3, 0.0, 0.1, 0.1, 0.7, 0.5, 0.6, 0.5, 1.3, 0.9, 0.8, 0.7, 0.6, 0.3, 0.3, 0.3, 0.5,
                    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                    0.2, 0.0, 0.0, 0.0, 0.3, 0.2, 0.2, 0.2, 1.5, 0.8, 0.8, 0.8, 0.7, 0.3, 0.3, 0.2, 0.5,
                    3.5, 0.0, 0.0, 0.0, 2.7, 1.6, 3.2, 2.2, 17.7, 9.3, 9.4, 6.9, 5.6, 2.4, 3.4, 3.3, 4.4), nrow = 20, byrow = TRUE)

    rownames(data_orig) <- categories_orig
    colnames(data_orig) <- samples_orig

    # Collapse sense/antisense
    merge_rows <- function(prefix) {
        row1 <- paste0(prefix, "_sense")
        row2 <- paste0(prefix, "_antisense")
        if (row1 %in% rownames(data_orig) && row2 %in% rownames(data_orig)) {
            return(data_orig[row1, ] + data_orig[row2, ])
        } else if (row1 %in% rownames(data_orig)) {
            return(data_orig[row1, ])
        } else {
            return(rep(0, ncol(data_orig)))
        }
    }

    # Construct merged data
    data_merged <- rbind(
    reads_used_for_alignment = data_orig["reads_used_for_alignment", ],
    genome = data_orig["genome", ],
    miRNA = merge_rows("miRNA"),
    miRNAprecursor = merge_rows("miRNAprecursor"),
    tRNA = merge_rows("tRNA"),
    piRNA = merge_rows("piRNA"),
    gencode = merge_rows("gencode"),
    circularRNA = merge_rows("circularRNA"),
    not_mapped_to_genome_or_libs = data_orig["not_mapped_to_genome_or_libs", ],
    repetitiveElements = data_orig["repetitiveElements", ],
    endogenous_gapped = data_orig["endogenous_gapped", ],
    exogenous_miRNA = data_orig["exogenous_miRNA", ],
    exogenous_rRNA = data_orig["exogenous_rRNA", ],
    exogenous_genomes = data_orig["exogenous_genomes", ]
    )

    # Reorder columns to match desired sample order
    data_final <- data_merged[, samples[samples %in% colnames(data_merged)]]

    #genome --> human_genome, not_mapped_to_genome_or_libs --> not_mapped_to_human_genome
    rownames(data_final)[rownames(data_final) == "genome"] <- "human_genome"
    rownames(data_final)[rownames(data_final) == "not_mapped_to_genome_or_libs"] <- "not_mapped_to_human_genome"

    # Save to Excel
    write.xlsx(data_final, file = "distribution_heatmap.xlsx", rowNames = TRUE)

# -- Python-code --

    python ~/Scripts/plot_distribution_heatmap.py distribution_heatmap.xlsx distribution_heatmap.png

            import pandas as pd
            import numpy as np
            import seaborn as sns
            import matplotlib.pyplot as plt

            ## Load data from Excel file
            #file_path = "distribution_heatmap.xlsx"
            #
            ## Read Excel file, assuming first column is index (row labels)
            #df = pd.read_excel(file_path, index_col=0)

            # Convert percentages to decimals
            data = data / 100.0

            # Create DataFrame
            df = pd.DataFrame(data, index=categories, columns=samples)

            # Plot heatmap
            plt.figure(figsize=(14, 6))
            sns.heatmap(df, annot=True, cmap="coolwarm", fmt=".3f", linewidths=0.5, cbar_kws={'label': 'Fraction Aligned Reads'})

            # Improve layout
            plt.title("Heatmap of Read Alignments by Category and Sample", fontsize=14)
            plt.xlabel("Sample", fontsize=12)
            plt.ylabel("Read Category", fontsize=12)
            plt.xticks(rotation=15, ha="right", fontsize=10)
            plt.yticks(rotation=0, fontsize=10)
            plt.tight_layout()

            # Save as PNG
            plt.savefig("distribution_heatmap.png", dpi=300, bbox_inches="tight")

            # Show plot
            plt.show()

Key steps of log: This log details the execution of a small RNA sequencing data analysis pipeline using the exceRpt tool (version 4.6.3) in a Docker container. The pipeline processes a human small RNA-seq dataset (testData_human.fastq.gz) with the following key steps:
- Initial Setup
  - Docker container launched with mounted volumes for input/output and reference databases.
  - Parameters: hg38 genome, 50 threads, 200GB Java memory, exogenous mapping enabled.
  - Docker container launched with input/output volume mounts
  - 50 threads allocated with 200GB Java memory
  - hg38 reference genome specified
- Preprocessing
  - Adapter detection and trimming using known adapter sequences.
  - Quality filtering (Phred score ≥20, length ≥18nt).
  - Removal of homopolymer-rich reads and low-quality sequences.
  - Input FASTQ file decompressed (testData_human.fastq.gz)
  - Adapter sequences identified using adapters.fa
  - Quality encoding determined (Phred+33/64)
  - Adapter clipping performed (TCGTATGCCGTCTTCTGCTTG)
  - Quality filtering (Q20, p<80%)
  - Homopolymer repeats filtered (max 66% single nt)
- Contaminant Filtering
  - Alignment against UniVec contaminants and ribosomal RNA (rRNA) databases.
  - 322 reads processed, with statistics tracked at each step.
- Endogenous RNA Analysis
  - Alignment to human genome (hg38) and transcriptome.
  - Quantification of small RNA types:
    - miRNA (mature/precursor): Sense strands detected (antisense absent).
    - tRNA, piRNA, gencode transcripts: Only sense strands reported.
    - circRNA: Not detected in this dataset.
  - Coverage and complexity metrics calculated.
- Exogenous RNA Analysis
  - Screened for microbial/viral RNAs:
    - miRNA databases (miRBase).
    - Ribosomal RNA databases.
    - Comprehensive genomic databases (bacteria, plants, metazoa, fungi, viruses).
  - Taxonomic classification of exogenous hits performed.
- QC & Results
  - QC Result: PASS (based on transcriptome/genome ratio >0.5 and >100k transcriptome reads).
  - Key Metrics:
    - Input Reads: ~1.5 million (exact count not shown in log).
    - Genome Mapped: Majority of reads.
    - Transcriptome Complexity: Calculated ratio.
  - Core results compressed into testData_human.fastq_CORE_RESULTS_v4.6.3.tgz.
- Notable Observations:
  - Antisense Reads: Absent for miRNA, tRNA, and piRNA (common in small RNA-seq).
  - Potential Issues: Some files (e.g., antisense counts) were missing but did not disrupt pipeline.
  - Resource Usage: High RAM (200GB) and multi-threading (50 cores) employed for efficiency.
- Output Files:
  - Quantified counts for endogenous RNAs (miRNA, tRNA, etc.).
  - Exogenous RNA alignments with taxonomic annotations.
  - QC report, adapter sequences, and alignment statistics.

Downstream analyis using R for miRNAs

# see http://xgenes.com/article/article-content/288/draw-plots-for-mirnas-generated-by-compsra/
# see http://xgenes.com/article/article-content/289/draw-plots-for-pirna-generated-by-compsra/
# see http://xgenes.com/article/article-content/290/draw-plots-for-snrna-generated-by-compsra/

#Input file
#exceRpt_miRNA_ReadCounts.txt
#exceRpt_piRNA_ReadCounts.txt

cd ~/DATA/Data_Ute/Data_Ute_smallRNA_7/summaries_exo7
mamba activate r_env
R
#> .libPaths()
#[1] "/home/jhuang/mambaforge/envs/r_env/lib/R/library"

#BiocManager::install("AnnotationDbi")
#BiocManager::install("clusterProfiler")
#BiocManager::install(c("ReactomePA","org.Hs.eg.db"))
#BiocManager::install("limma")
#BiocManager::install("sva")
#install.packages("writexl")
#install.packages("openxlsx")
library("AnnotationDbi")
library("clusterProfiler")
library("ReactomePA")
library("org.Hs.eg.db")
library(DESeq2)
library(gplots)
library(limma)
library(sva)
#library(writexl)  #d.raw_with_rownames <- cbind(RowNames = rownames(d.raw), d.raw); write_xlsx(d.raw, path = "d_raw.xlsx");
library(openxlsx)

setwd("../summaries_exo7/")
d.raw<- read.delim2("exceRpt_miRNA_ReadCounts.txt",sep="\t", header=TRUE, row.names=1)

# Desired column order
desired_order <- c(
    "parental_cells_1", "parental_cells_2", "parental_cells_3",
    "untreated_1", "untreated_2",
    "scr_control_1", "scr_control_2", "scr_control_3",
    "DMSO_control_1", "DMSO_control_2", "DMSO_control_3",
    "scr_DMSO_control_1", "scr_DMSO_control_2", "scr_DMSO_control_3",
    "sT_knockdown_1", "sT_knockdown_2", "sT_knockdown_3"
)
# Reorder columns
d.raw <- d.raw[, desired_order]
setdiff(desired_order, colnames(d.raw))  # Shows missing or misnamed columns
#sapply(d.raw, is.numeric)
d.raw[] <- lapply(d.raw, as.numeric)
#d.raw[] <- lapply(d.raw, function(x) as.numeric(as.character(x)))
d.raw <- round(d.raw)
write.csv(d.raw, file ="d_raw.csv")
write.xlsx(d.raw, file = "d_raw.xlsx", rowNames = TRUE)

# ------ Code sent to Ute ------
#d.raw <- read.delim2("d_raw.csv",sep=",", header=TRUE, row.names=1)
parental_or_EV = as.factor(c("parental","parental","parental", "EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV"))
#donor = as.factor(c("0505","1905", "0505","1905", "0505","1905", "0505","1905", "0505","1905", "0505","1905"))
batch = as.factor(c("Aug22","March25","March25", "Sep23","Sep23", "Sep23","Sep23","March25", "Sep23","Sep23","March25", "Sep23","Sep23","March25", "Sep23","Sep23","March25"))

replicates = as.factor(c("parental_cells","parental_cells","parental_cells",  "untreated","untreated",   "scr_control","scr_control","scr_control",  "DMSO_control","DMSO_control","DMSO_control",  "scr_DMSO_control", "scr_DMSO_control","scr_DMSO_control",  "sT_knockdown", "sT_knockdown", "sT_knockdown"))
ids = as.factor(c("parental_cells_1", "parental_cells_2", "parental_cells_3",
    "untreated_1", "untreated_2",
    "scr_control_1", "scr_control_2", "scr_control_3",
    "DMSO_control_1", "DMSO_control_2", "DMSO_control_3",
    "scr_DMSO_control_1", "scr_DMSO_control_2", "scr_DMSO_control_3",
    "sT_knockdown_1", "sT_knockdown_2", "sT_knockdown_3"))
cData = data.frame(row.names=colnames(d.raw), replicates=replicates, ids=ids, batch=batch, parental_or_EV=parental_or_EV)
dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates+batch)

# Filter low-count miRNAs
dds <- dds[ rowSums(counts(dds)) > 10, ]  #1322-->903
rld <- rlogTransformation(dds)

# -- before pca --
png("pca.png", 1200, 800)
plotPCA(rld, intgroup=c("replicates"))
#plotPCA(rld, intgroup = c("replicates", "batch"))
#plotPCA(rld, intgroup = c("replicates", "ids"))
#plotPCA(rld, "batch")
dev.off()
png("pca2.png", 1200, 800)
#plotPCA(rld, intgroup=c("replicates"))
#plotPCA(rld, intgroup = c("replicates", "batch"))
#plotPCA(rld, intgroup = c("replicates", "ids"))
plotPCA(rld, "batch")
dev.off()

# Batch Effect Removal Methods:

#Applying batch effect correction techniques such as ComBat or SVA (Surrogate Variable Analysis).

#- Using ComBat (from the sva package):

    # Assume `rld` is the rlog-transformed counts from DESeq2
    rld_corrected <- ComBat(dat = assay(rld), batch = cData$batch, mod = model.matrix(~ replicates, data = cData))

    # Visualize corrected PCA
    pca_corrected <- prcomp(t(rld_corrected))
    png("pca_after_batch_correction.png", 1200, 800)
    plot(pca_corrected$x[, 1:2], col = cData$replicates)
    dev.off()

#- Using SVA (Surrogate Variable Analysis):

    #If batch effects are strong and you want to remove hidden batch effects, SVA can help identify latent factors. After identifying these latent factors, you can add them to the DESeq2 design.

    # Assume that rld contains the rlog-transformed data
    mod <- model.matrix(~ replicates, data = cData)  # This should include your main experimental variables
    sva_results <- sva(assay(rld), mod)

    #You would then adjust the design formula to include these latent variables.

#- Using removeBatchEffect (CHOSEN!)

    #http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#how-do-i-use-vst-or-rlog-data-for-differential-testing
    mat <- assay(rld)
    mm <- model.matrix(~replicates, colData(rld))
    mat <- limma::removeBatchEffect(mat, batch=rld$batch, design=mm)
    assay(rld) <- mat

#- After batch effect removal, you should see a shift in the PCA plot — ideally, the samples should now cluster based on replicates or biological conditions rather than the batch.
#If the batch effect has been successfully removed:
#    * Before correction: You will likely see samples grouped by batch.
#    * After correction: You should see the samples grouped by biological condition (e.g., parental, EV, scr_control, etc.).

    # -- after pca --
    png("pca_after_batch_correction.png", 1200, 800)
    #plotPCA(rld, intgroup = c("replicates", "batch"))
    #plotPCA(rld, intgroup = c("replicates", "ids"))
    plotPCA(rld, intgroup=c("replicates"))
    dev.off()
    png("pca_after_batch_correction2.png", 1200, 800)
    plotPCA(rld, "batch")
    dev.off()

    # -- after heatmap --
    ## generate the pairwise comparison between samples
    png("heatmap_after_batch_correction.png", 1200, 800)
    distsRL <- dist(t(assay(rld)))
    mat <- as.matrix(distsRL)
    rownames(mat) <- colnames(mat) <- with(colData(dds),paste(replicates,batch, sep=":"))
    #rownames(mat) <- colnames(mat) <- with(colData(dds),paste(replicates,ids, sep=":"))
    hc <- hclust(distsRL)
    hmcol <- colorRampPalette(brewer.pal(9,"GnBu"))(100)
    heatmap.2(mat, Rowv=as.dendrogram(hc),symm=TRUE, trace="none",col = rev(hmcol), margin=c(13, 13))
    dev.off()

#### STEP2: DEGs ####
#- Heatmap untreated/wt vs parental; 1x for WaGa cell line
#- Volcano plot untreated/wt vs parental; 1x for WaGa cell line
#- Manhattan plot miRNAs; 1x for WaGa cell line
#- Distribution of different small RNA species untreated/wt and parental; 1x for WaGa cell line
#- Motif analysis: identify RNA-binding proteins that may regulate small RNA loading; 1x for WaGa cell line

#convert bam to bigwig using deepTools by feeding inverse of DESeq’s size Factor
sizeFactors(dds)
#NULL
dds <- estimateSizeFactors(dds)
sizeFactors(dds)
normalized_counts <- counts(dds, normalized=TRUE)
write.table(normalized_counts, file="normalized_counts.txt", sep="\t", quote=F, col.names=NA)
write.xlsx(normalized_counts, file = "normalized_counts.xlsx", rowNames = TRUE)

#---- untreated, scr_control, DMSO_control, scr_DMSO_control, sT_knockdown to parental_cells ----
dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates+batch)

dds$replicates <- relevel(dds$replicates, "parental_cells")
dds = DESeq(dds, betaPrior=FALSE)  #default betaPrior is FALSE
resultsNames(dds)
clist <- c("untreated_vs_parental_cells")

dds$replicates <- relevel(dds$replicates, "untreated")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("DMSO_control_vs_untreated", "scr_control_vs_untreated", "scr_DMSO_control_vs_untreated", "sT_knockdown_vs_untreated")

dds$replicates <- relevel(dds$replicates, "DMSO_control")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("sT_knockdown_vs_DMSO_control")

dds$replicates <- relevel(dds$replicates, "scr_control")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("sT_knockdown_vs_scr_control")

dds$replicates <- relevel(dds$replicates, "scr_DMSO_control")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("sT_knockdown_vs_scr_DMSO_control")

#NOTE that the results sent to Ute is |padj|<=0.1.
for (i in clist) {
    contrast = paste("replicates", i, sep="_")
    res = results(dds, name=contrast)
    res <- res[!is.na(res$log2FoldChange),]
    #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na
    res$padj <- ifelse(is.na(res$padj), 1, res$padj)
    res_df <- as.data.frame(res)
    write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-"))
    up <- subset(res_df, padj<=0.05 & log2FoldChange>=2)
    down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2)
    write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-"))
    write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-"))
}

~/Tools/csv2xls-0.4/csv_to_xls.py \
untreated_vs_parental_cells-all.txt \
untreated_vs_parental_cells-up.txt \
untreated_vs_parental_cells-down.txt \
-d$',' -o untreated_vs_parental_cells.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
DMSO_control_vs_untreated-all.txt \
DMSO_control_vs_untreated-up.txt \
DMSO_control_vs_untreated-down.txt \
-d$',' -o DMSO_control_vs_untreated.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
scr_control_vs_untreated-all.txt \
scr_control_vs_untreated-up.txt \
scr_control_vs_untreated-down.txt \
-d$',' -o scr_control_vs_untreated.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
scr_DMSO_control_vs_untreated-all.txt \
scr_DMSO_control_vs_untreated-up.txt \
scr_DMSO_control_vs_untreated-down.txt \
-d$',' -o scr_DMSO_control_vs_untreated.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
sT_knockdown_vs_untreated-all.txt \
sT_knockdown_vs_untreated-up.txt \
sT_knockdown_vs_untreated-down.txt \
-d$',' -o sT_knockdown_vs_untreated.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
sT_knockdown_vs_DMSO_control-all.txt \
sT_knockdown_vs_DMSO_control-up.txt \
sT_knockdown_vs_DMSO_control-down.txt \
-d$',' -o sT_knockdown_vs_DMSO_control.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
sT_knockdown_vs_scr_control-all.txt \
sT_knockdown_vs_scr_control-up.txt \
sT_knockdown_vs_scr_control-down.txt \
-d$',' -o sT_knockdown_vs_scr_control.xls;

~/Tools/csv2xls-0.4/csv_to_xls.py \
sT_knockdown_vs_scr_DMSO_control-all.txt \
sT_knockdown_vs_scr_DMSO_control-up.txt \
sT_knockdown_vs_scr_DMSO_control-down.txt \
-d$',' -o sT_knockdown_vs_scr_DMSO_control.xls;

# ------------------- volcano_plot -------------------
library(ggplot2)
library(ggrepel)

geness_res <- read.csv(file = paste("untreated_vs_parental_cells", "all.txt", sep="-"), row.names=1)

external_gene_name <- rownames(geness_res)
geness_res <- cbind(geness_res, external_gene_name)
#top_g are from ids
top_g <- c("hsa-let-7b-5p","hsa-let-7g-5p","hsa-let-7i-5p","hsa-miR-103a-3p","hsa-miR-107","hsa-miR-1224-5p","hsa-miR-122-5p","hsa-miR-1226-5p","hsa-miR-1246","hsa-miR-127-3p","hsa-miR-1290","hsa-miR-130a-3p","hsa-miR-139-3p","hsa-miR-141-3p","hsa-miR-143-3p","hsa-miR-148b-3p","hsa-miR-155-5p","hsa-miR-15a-5p","hsa-miR-17-5p","hsa-miR-184","hsa-miR-18a-3p","hsa-miR-18a-5p","hsa-miR-190a-5p","hsa-miR-191-5p","hsa-miR-193b-5p","hsa-miR-197-5p","hsa-miR-200a-3p","hsa-miR-200b-5p","hsa-miR-206","hsa-miR-20a-5p","hsa-miR-210-3p","hsa-miR-2110","hsa-miR-21-5p","hsa-miR-218-5p","hsa-miR-219a-1-3p","hsa-miR-221-3p","hsa-miR-23b-3p","hsa-miR-27a-3p","hsa-miR-27b-3p","hsa-miR-27b-5p","hsa-miR-28-3p","hsa-miR-30a-5p","hsa-miR-30c-5p","hsa-miR-30e-5p","hsa-miR-3127-5p","hsa-miR-3131","hsa-miR-3180|hsa-miR-3180-3p","hsa-miR-320a","hsa-miR-320b","hsa-miR-320c","hsa-miR-320d","hsa-miR-330-3p","hsa-miR-335-3p","hsa-miR-33b-5p","hsa-miR-340-5p","hsa-miR-342-5p","hsa-miR-3605-5p","hsa-miR-361-3p","hsa-miR-365a-5p","hsa-miR-374b-5p","hsa-miR-378i","hsa-miR-379-5p","hsa-miR-3940-5p","hsa-miR-409-3p","hsa-miR-411-5p","hsa-miR-423-3p","hsa-miR-423-5p","hsa-miR-4286","hsa-miR-429","hsa-miR-432-5p","hsa-miR-4326","hsa-miR-451a","hsa-miR-4520-3p","hsa-miR-454-3p","hsa-miR-4646-5p","hsa-miR-4667-5p","hsa-miR-4748","hsa-miR-483-5p","hsa-miR-486-5p","hsa-miR-5010-5p","hsa-miR-504-3p","hsa-miR-5187-5p","hsa-miR-590-3p","hsa-miR-6128","hsa-miR-625-5p","hsa-miR-6726-5p","hsa-miR-6730-5p","hsa-miR-676-3p","hsa-miR-6767-5p","hsa-miR-6777-5p","hsa-miR-6780a-5p","hsa-miR-6794-5p","hsa-miR-6817-3p","hsa-miR-708-5p","hsa-miR-7-5p","hsa-miR-766-5p","hsa-miR-7854-3p","hsa-miR-873-3p","hsa-miR-885-3p","hsa-miR-92b-5p","hsa-miR-93-5p","hsa-miR-937-3p","hsa-miR-9-5p","hsa-miR-98-5p")
subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0))
geness_res$Color <- "NS or log2FC < 2.0"
geness_res$Color[geness_res$pvalue < 0.05] <- "P < 0.05"
geness_res$Color[geness_res$padj < 0.05] <- "P-adj < 0.05"
geness_res$Color[abs(geness_res$log2FoldChange) < 2.0] <- "NS or log2FC < 2.0"

write.csv(geness_res, "untreated_vs_parental_cells_with_Category.csv")
geness_res$invert_P <- (-log10(geness_res$pvalue)) * sign(geness_res$log2FoldChange)

geness_res <- geness_res[, -1*ncol(geness_res)]
png("volcano_plot_untreated_vs_parental_cells.png",width=1200, height=1400)
#svg("untreated_vs_parental_cells.svg",width=12, height=14)
ggplot(geness_res,       aes(x = log2FoldChange, y = -log10(pvalue),           color = Color, label = external_gene_name)) +       geom_vline(xintercept = c(2.0, -2.0), lty = "dashed") +       geom_hline(yintercept = -log10(0.05), lty = "dashed") +       geom_point() +       labs(x = "log2(FC)", y = "Significance, -log10(P)", color = "Significance") +       scale_color_manual(values = c("P < 0.05"="orange","P-adj < 0.05"="red","NS or log2FC < 2.0"="darkgray"),guide = guide_legend(override.aes = list(size = 4))) + scale_y_continuous(expand = expansion(mult = c(0,0.05))) +       geom_text_repel(data = subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)), size = 4, point.padding = 0.15, color = "black", min.segment.length = .1, box.padding = .2, lwd = 2) +       theme_bw(base_size = 16) +       theme(legend.position = "bottom")
dev.off()

# ------------------ differentially_expressed_miRNAs_heatmap -----------------
# prepare all_genes
rld <- rlogTransformation(dds)
mat <- assay(rld)
mm <- model.matrix(~replicates, colData(rld))
mat <- limma::removeBatchEffect(mat, batch=rld$batch, design=mm)
assay(rld) <- mat
RNASeq.NoCellLine <- assay(rld)

# reorder the columns
#colnames(RNASeq.NoCellLine) = c("0505 WaGa sT DMSO","1905 WaGa sT DMSO","0505 WaGa sT Dox","1905 WaGa sT Dox","0505 WaGa scr DMSO","1905 WaGa scr DMSO","0505 WaGa scr Dox","1905 WaGa scr Dox","0505 WaGa wt","1905 WaGa wt","control MKL1","control WaGa")
#col.order <-c("control MKL1",  "control WaGa","0505 WaGa wt","1905 WaGa wt","0505 WaGa sT DMSO","1905 WaGa sT DMSO","0505 WaGa sT Dox","1905 WaGa sT Dox","0505 WaGa scr DMSO","1905 WaGa scr DMSO","0505 WaGa scr Dox","1905 WaGa scr Dox")
#RNASeq.NoCellLine <- RNASeq.NoCellLine[,col.order]

#Option4: manully defining
#for i in untreated_vs_parental_cells    sT_knockdown_vs_untreated DMSO_control_vs_untreated scr_control_vs_untreated scr_DMSO_control_vs_untreated    sT_knockdown_vs_DMSO_control sT_knockdown_vs_scr_control sT_knockdown_vs_scr_DMSO_control; do
#  echo "cut -d',' -f1-1 ${i}-up.txt > ${i}-up.id";
#  echo "cut -d',' -f1-1 ${i}-down.txt > ${i}-down.id";
#done
#cat *.id | sort -u > ids
##add Gene_Id in the first line, delete the ""
GOI <- read.csv("ids")$Gene_Id
datamat = RNASeq.NoCellLine[GOI, ]

# clustering the genes and draw heatmap
#datamat <- datamat[,-1]  #delete the sample "control MKL1"
#datamat <- datamat[, 1:5]

#parental_cells_1 parental_cells_2 parental_cells_3    untreated_1 untreated_2    scr_control_1 scr_control_2 scr_control_3     DMSO_control_1 DMSO_control_2 DMSO_control_3    scr_DMSO_control_1 scr_DMSO_control_2 scr_DMSO_control_3    sT_knockdown_1 sT_knockdown_2 sT_knockdown_3 -->
#parental cells 1 parental cells 2 parental cells 3    untreated 1 untreated 2    scr control 1 scr control 2 scr control 3    DMSO control 1 DMSO control 2 DMSO control 3    scr DMSO control 1 scr DMSO control 2 scr DMSO control 3    sT knockdown 1 sT knockdown 2 sT knockdown 3
colnames(datamat)[1] <- "parental cells 1"
colnames(datamat)[2] <- "parental cells 2"
colnames(datamat)[3] <- "parental cells 3"
colnames(datamat)[4] <- "untreated 1"
colnames(datamat)[5] <- "untreated 2"
colnames(datamat)[6] <- "scr control 1"
colnames(datamat)[7] <- "scr control 2"
colnames(datamat)[8] <- "scr control 3"
colnames(datamat)[9] <- "DMSO control 1"
colnames(datamat)[10] <- "DMSO control 2"
colnames(datamat)[11] <- "DMSO control 3"
colnames(datamat)[12] <- "scr DMSO control 1"
colnames(datamat)[13] <- "scr DMSO control 2"
colnames(datamat)[14] <- "scr DMSO control 3"
colnames(datamat)[15] <- "sT knockdown 1"
colnames(datamat)[16] <- "sT knockdown 2"
colnames(datamat)[17] <- "sT knockdown 3"

write.csv(datamat, file ="gene_expression_keeping_replicates.txt")
write.xlsx(datamat, file = "gene_expression_keeping_replicates.xlsx", rowNames = TRUE)
#"ward.D"’, ‘"ward.D2"’,‘"single"’, ‘"complete"’, ‘"average"’ (= UPGMA), ‘"mcquitty"’(= WPGMA), ‘"median"’ (= WPGMC) or ‘"centroid"’ (= UPGMC)
hr <- hclust(as.dist(1-cor(t(datamat), method="pearson")), method="complete")
hc <- hclust(as.dist(1-cor(datamat, method="spearman")), method="complete")
mycl = cutree(hr, h=max(hr$height)/1.1)
mycol = c("YELLOW", "BLUE", "ORANGE", "CYAN", "GREEN", "MAGENTA", "GREY", "LIGHTCYAN", "RED",     "PINK", "DARKORANGE", "MAROON",  "LIGHTGREEN", "DARKBLUE",  "DARKRED",   "LIGHTBLUE", "DARKCYAN",  "DARKGREEN", "DARKMAGENTA");
mycol = mycol[as.vector(mycl)]

rownames(datamat) <- sub("\\|.*", "", rownames(datamat))

png("DEGs_heatmap_keeping_replicates.png", width=1000, height=1400)
#svg("DEGs_heatmap_keeping_replicates.svg", width=6, height=8)
heatmap.2(as.matrix(datamat),
    Rowv=as.dendrogram(hr),
    Colv=NA,
    dendrogram='row',
    labRow=row.names(datamat),
    scale='row',
    trace='none',
    col=bluered(75),
    RowSideColors=mycol,
    srtCol=30,
    lhei=c(1,8),
    cexRow=1.4,   # Increase row label font size
    cexCol=1.7,    # Increase column label font size
    margin=c(8, 12)
    )
dev.off()

# ----------- manhattan_plot -------------

# TODO_TOMORROW: the top miRNA should different, since we want to see the differentially expressed miRNA, therefore we should show the top DEG miRNA, find the top-5 and mark the 5 as the red points and give the label!
# TODO_piRNA
# TODO: Both motiv calling!
# TODO: send the results to Ute!

# Load the required libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(ggrepel)  # For better label positioning

# Step 1: Compute RPM from raw counts (d.raw has miRNAs in rows, samples in columns)
d.raw_5 <- d.raw[, 1:5]  # assuming 5 samples
total_counts <- colSums(d.raw_5)
RPM <- sweep(d.raw_5, 2, total_counts, FUN = "/") * 1e6

# Step 2: Prepare long-format dataframe
RPM$miRNA <- rownames(RPM)
df <- pivot_longer(RPM, cols = -miRNA, names_to = "sample", values_to = "RPM")

# Step 3: Log-transform RPM
df <- df %>%
mutate(logRPM = log10(RPM + 1))

# Step 4: Add miRNA index for x-axis positioning
df <- df %>%
arrange(miRNA) %>%
group_by(sample) %>%
mutate(Position = row_number())

# Step 5: Identify top miRNAs based on mean RPM
top_mirnas <- df %>%
group_by(miRNA) %>%
summarise(mean_RPM = mean(RPM)) %>%
arrange(desc(mean_RPM)) %>%
head(5) %>%
pull(miRNA)  # Get the names of top 5 miRNAs

# Step 6: Assign color based on whether the miRNA is top or not
df$color <- ifelse(df$miRNA %in% top_mirnas, "red", "darkblue")

# Rename the sample labels for display
sample_labels <- c(
"parental_cells_1" = "Parental cell 1",
"parental_cells_2" = "Parental cell 2",
"parental_cells_3" = "Parental cell 3",
"untreated_1"      = "Untreated 1",
"untreated_2"      = "Untreated 2"
)

# Step 7: Plot
png("manhattan_plot_top_miRNAs_based_on_mean_RPM.png", width = 1200, height = 1200)
ggplot(df, aes(x = Position, y = logRPM, color = color)) +
scale_color_manual(values = c("red" = "red", "darkblue" = "darkblue")) +
geom_jitter(width = 0.4) +
geom_text_repel(
    data = df %>% filter(miRNA %in% top_mirnas),
    aes(label = miRNA),
    box.padding = 0.5,
    point.padding = 0.5,
    segment.color = 'gray50',
    size = 5,
    max.overlaps = 8,
    color = "black"
) +
labs(x = "", y = "log10(Read Per Million) (RPM)") +
facet_wrap(~sample, scales = "free_x", ncol = 5,
            labeller = labeller(sample = sample_labels)) +
theme_minimal() +
theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    legend.position = "none",
    text = element_text(size = 16),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 16, face = "bold"),
    panel.spacing = unit(1.5, "lines")  # <-- More space between plots
)
dev.off()

top_mirnas = c("hsa-miR-20a-5p","hsa-miR-93-5p","hsa-let-7g-5p","hsa-miR-30a-5p","hsa-miR-423-5p","hsa-let-7i-5p")
#,"hsa-miR-17-5p","hsa-miR-107","hsa-miR-483-5p","hsa-miR-9-5p","hsa-miR-103a-3p","hsa-miR-30e-5p","hsa-miR-21-5p","hsa-miR-30d-5p")

# Step 6: Assign color based on whether the miRNA is top or not
df$color <- ifelse(df$miRNA %in% top_mirnas, "red", "darkblue")

# Rename the sample labels for display
sample_labels <- c(
"parental_cells_1" = "Parental cell 1",
"parental_cells_2" = "Parental cell 2",
"parental_cells_3" = "Parental cell 3",
"untreated_1"      = "Untreated 1",
"untreated_2"      = "Untreated 2"
)

# Step 7: Plot
png("manhattan_plot_most_differentially_expressed_miRNAs.png", width = 1200, height = 1200)
ggplot(df, aes(x = Position, y = logRPM, color = color)) +
scale_color_manual(values = c("red" = "red", "darkblue" = "darkblue")) +
geom_jitter(width = 0.4) +
geom_text_repel(
    data = df %>% filter(miRNA %in% top_mirnas),
    aes(label = miRNA),
    box.padding = 0.5,
    point.padding = 0.5,
    segment.color = 'gray50',
    size = 5,
    max.overlaps = 8,
    color = "black"
) +
labs(x = "", y = "log10(Read Per Million) (RPM)") +
facet_wrap(~sample, scales = "free_x", ncol = 5,
            labeller = labeller(sample = sample_labels)) +
theme_minimal() +
theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    legend.position = "none",
    text = element_text(size = 16),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 16, face = "bold"),
    panel.spacing = unit(1.5, "lines")  # <-- More space between plots
)
dev.off()

mkdir miRNAs
mv *.png miRNAs
mv *.svg miRNAs
mv *.csv miRNAs
mv *.xls* miRNAs
mv *.id miRNAs
mv ids miRNAs
mv normalized_counts.txt miRNAs
mv *-all.txt miRNAs
mv *-up.txt miRNAs
mv *-down.txt miRNAs
mv gene_expression_keeping_replicates.txt miRNAs
cd miRNAs
mv DEGs_heatmap_keeping_replicates.png differentially_expressed_miRNAs_heatmap.png
mv volcano_plot_untreated_vs_parental_cells.png volcano_plot_miRNAs_untreated_vs_parental_cells.png
mv untreated_vs_parental_cells.xls miRNA_untreated_vs_parental_cells.xls

Do separate shRNA and treatment analysis

# cut [1-5], the remaining are
d.raw_12 <- d.raw[, 6:17]
#> colnames(d.raw_12)
#[1] "scr_control_1"      "scr_control_2"      "scr_control_3"
#[4] "DMSO_control_1"     "DMSO_control_2"     "DMSO_control_3"
#[7] "scr_DMSO_control_1" "scr_DMSO_control_2" "scr_DMSO_control_3"
#[10] "sT_knockdown_1"     "sT_knockdown_2"     "sT_knockdown_3"
#    "scr Dox" → "scr control"
#    "sT DMSO" → "DMSO control"
#    "scr DMSO" → "scr DMSO control"
#    "sT Dox" → "sT knockdown"

shRNA = as.factor(c("scr","scr","scr","sT","sT","sT","scr","scr","scr","sT","sT","sT"))
treatment = as.factor(c("Dox","Dox","Dox","DMSO","DMSO","DMSO","DMSO","DMSO","DMSO","Dox","Dox","Dox"))
cData = data.frame(row.names=colnames(d.raw_12), shRNA=shRNA, treatment=treatment)
dds_shRNA_treatment<-DESeqDataSetFromMatrix(countData=d.raw_12, colData=cData, design=~shRNA+treatment+shRNA:treatment)

dds_shRNA_treatment = DESeq(dds_shRNA_treatment, betaPrior=FALSE)
resultsNames(dds_shRNA_treatment)
contrasts <- c("shRNA_sT_vs_scr", "treatment_Dox_vs_DMSO", "shRNAsT.treatmentDox")

for (contrast in contrasts) {
        res = results(dds_shRNA_treatment, name=contrast)
        res <- res[!is.na(res$log2FoldChange),]
        #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na
        res$padj <- ifelse(is.na(res$padj), 1, res$padj)
        res_df <- as.data.frame(res)
        write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(contrast, "all.txt", sep="-"))
        up <- subset(res_df, padj<=0.05 & log2FoldChange>=2)
        down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2)
        write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(contrast, "up.txt", sep="-"))
        write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(contrast, "down.txt", sep="-"))
    }

#~/Tools/csv2xls-0.4/csv_to_xls.py shRNA_sT_vs_scr-up.txt shRNA_sT_vs_scr-down.txt shRNA_sT_vs_scr-all.txt -d$',' -o shRNA_sT_vs_scr.xls
#~/Tools/csv2xls-0.4/csv_to_xls.py treatment_Dox_vs_DMSO-up.txt treatment_Dox_vs_DMSO-down.txt treatment_Dox_vs_DMSO-all.txt -d$',' -o treatment_Dox_vs_DMSO.xls
#~/Tools/csv2xls-0.4/csv_to_xls.py shRNAsT.treatmentDox-up.txt shRNAsT.treatmentDox-down.txt shRNAsT.treatmentDox-all.txt -d$',' -o shRNAsT.treatmentDox.xls

Downstream analyis using R for piRNAs

d.raw<- read.delim2("exceRpt_piRNA_ReadCounts.txt",sep="\t", header=TRUE, row.names=1)

# Desired column order
desired_order <- c(
    "parental_cells_1", "parental_cells_2", "parental_cells_3",
    "untreated_1", "untreated_2",
    "scr_control_1", "scr_control_2", "scr_control_3",
    "DMSO_control_1", "DMSO_control_2", "DMSO_control_3",
    "scr_DMSO_control_1", "scr_DMSO_control_2", "scr_DMSO_control_3",
    "sT_knockdown_1", "sT_knockdown_2", "sT_knockdown_3"
)
# Reorder columns
d.raw <- d.raw[, desired_order]
setdiff(desired_order, colnames(d.raw))  # Shows missing or misnamed columns
#sapply(d.raw, is.numeric)
d.raw[] <- lapply(d.raw, as.numeric)
#d.raw[] <- lapply(d.raw, function(x) as.numeric(as.character(x)))
d.raw <- round(d.raw)
write.csv(d.raw, file ="d_raw.csv")
write.xlsx(d.raw, file = "d_raw.xlsx", rowNames = TRUE)

#Make the piRNA names shorter, e.g. "hsa_piR_016658|gb|DQ592931|Homo_sapiens:6:80508363:80508389:Plus" --> "hsa_piR_016658"
#paste -d',' f1_1 f2_ > d_raw_.csv
d.raw <- read.delim2("d_raw_.csv",sep=",", header=TRUE, row.names=1)
parental_or_EV = as.factor(c("parental","parental","parental", "EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV","EV"))
#donor = as.factor(c("0505","1905", "0505","1905", "0505","1905", "0505","1905", "0505","1905", "0505","1905"))
batch = as.factor(c("Aug22","March25","March25", "Sep23","Sep23", "Sep23","Sep23","March25", "Sep23","Sep23","March25", "Sep23","Sep23","March25", "Sep23","Sep23","March25"))

replicates = as.factor(c("parental_cells","parental_cells","parental_cells",  "untreated","untreated",   "scr_control","scr_control","scr_control",  "DMSO_control","DMSO_control","DMSO_control",  "scr_DMSO_control", "scr_DMSO_control","scr_DMSO_control",  "sT_knockdown", "sT_knockdown", "sT_knockdown"))
ids = as.factor(c("parental_cells_1", "parental_cells_2", "parental_cells_3",
    "untreated_1", "untreated_2",
    "scr_control_1", "scr_control_2", "scr_control_3",
    "DMSO_control_1", "DMSO_control_2", "DMSO_control_3",
    "scr_DMSO_control_1", "scr_DMSO_control_2", "scr_DMSO_control_3",
    "sT_knockdown_1", "sT_knockdown_2", "sT_knockdown_3"))
cData = data.frame(row.names=colnames(d.raw), replicates=replicates, ids=ids, batch=batch, parental_or_EV=parental_or_EV)
dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates+batch)

# Filter low-count miRNAs
dds <- dds[ rowSums(counts(dds)) > 10, ]  #364-->124
rld <- rlogTransformation(dds)

# -- before pca --
png("pca.png", 1200, 800)
plotPCA(rld, intgroup=c("replicates"))
#plotPCA(rld, intgroup = c("replicates", "batch"))
#plotPCA(rld, intgroup = c("replicates", "ids"))
#plotPCA(rld, "batch")
dev.off()
png("pca2.png", 1200, 800)
#plotPCA(rld, intgroup=c("replicates"))
#plotPCA(rld, intgroup = c("replicates", "batch"))
#plotPCA(rld, intgroup = c("replicates", "ids"))
plotPCA(rld, "batch")
dev.off()

# Batch Effect Removal Methods:

#Applying batch effect correction techniques such as ComBat, SVA (Surrogate Variable Analysis) or limma::removeBatchEffect.

    #http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#how-do-i-use-vst-or-rlog-data-for-differential-testing
    mat <- assay(rld)
    mm <- model.matrix(~replicates, colData(rld))
    mat <- limma::removeBatchEffect(mat, batch=rld$batch, design=mm)
    assay(rld) <- mat

#- After batch effect removal, you should see a shift in the PCA plot — ideally, the samples should now cluster based on replicates or biological conditions rather than the batch.
#If the batch effect has been successfully removed:
#    * Before correction: You will likely see samples grouped by batch.
#    * After correction: You should see the samples grouped by biological condition (e.g., parental, EV, scr_control, etc.).

    # -- after pca --
    png("pca_after_batch_correction.png", 1200, 800)
    #plotPCA(rld, intgroup = c("replicates", "batch"))
    #plotPCA(rld, intgroup = c("replicates", "ids"))
    plotPCA(rld, intgroup=c("replicates"))
    dev.off()
    png("pca_after_batch_correction2.png", 1200, 800)
    plotPCA(rld, "batch")
    dev.off()

    # -- after heatmap --
    ## generate the pairwise comparison between samples
    png("heatmap_after_batch_correction.png", 1200, 800)
    distsRL <- dist(t(assay(rld)))
    mat <- as.matrix(distsRL)
    rownames(mat) <- colnames(mat) <- with(colData(dds),paste(replicates,batch, sep=":"))
    #rownames(mat) <- colnames(mat) <- with(colData(dds),paste(replicates,ids, sep=":"))
    hc <- hclust(distsRL)
    hmcol <- colorRampPalette(brewer.pal(9,"GnBu"))(100)
    heatmap.2(mat, Rowv=as.dendrogram(hc),symm=TRUE, trace="none",col = rev(hmcol), margin=c(13, 13))
    dev.off()

#### STEP2: DEGs ####
#- Heatmap untreated/wt vs parental; 1x for WaGa cell line
#- Volcano plot untreated/wt vs parental; 1x for WaGa cell line
#- Manhattan plot miRNAs; 1x for WaGa cell line
#- Distribution of different small RNA species untreated/wt and parental; 1x for WaGa cell line
#- Motif analysis: identify RNA-binding proteins that may regulate small RNA loading; 1x for WaGa cell line

#convert bam to bigwig using deepTools by feeding inverse of DESeq’s size Factor
sizeFactors(dds)
#NULL
dds <- estimateSizeFactors(dds)
sizeFactors(dds)
normalized_counts <- counts(dds, normalized=TRUE)
write.table(normalized_counts, file="normalized_counts.txt", sep="\t", quote=F, col.names=NA)
write.xlsx(normalized_counts, file = "normalized_counts.xlsx", rowNames = TRUE)

#---- untreated, scr_control, DMSO_control, scr_DMSO_control, sT_knockdown to parental_cells ----
dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates+batch)

dds$replicates <- relevel(dds$replicates, "parental_cells")
dds = DESeq(dds, betaPrior=FALSE)  #default betaPrior is FALSE
resultsNames(dds)
clist <- c("untreated_vs_parental_cells")

#NOTE that the results sent to Ute is |padj|<=0.1.
for (i in clist) {
    contrast = paste("replicates", i, sep="_")
    res = results(dds, name=contrast)
    res <- res[!is.na(res$log2FoldChange),]
    #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na
    res$padj <- ifelse(is.na(res$padj), 1, res$padj)
    res_df <- as.data.frame(res)
    write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-"))
    up <- subset(res_df, padj<=0.05 & log2FoldChange>=2)
    down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2)
    write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-"))
    write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-"))
}

~/Tools/csv2xls-0.4/csv_to_xls.py \
untreated_vs_parental_cells-all.txt \
untreated_vs_parental_cells-up.txt \
untreated_vs_parental_cells-down.txt \
-d$',' -o untreated_vs_parental_cells.xls;

# ------------------- volcano_plot -------------------
library(ggplot2)
library(ggrepel)

geness_res <- read.csv(file = paste("untreated_vs_parental_cells", "all.txt", sep="-"), row.names=1)

external_gene_name <- rownames(geness_res)
geness_res <- cbind(geness_res, external_gene_name)
#top_g are from ids
top_g <- c("hsa_piR_000805","hsa_piR_001152","hsa_piR_001170","hsa_piR_001205","hsa_piR_009051","hsa_piR_010894","hsa_piR_012681","hsa_piR_012753","hsa_piR_016659","hsa_piR_017033","hsa_piR_017178","hsa_piR_018292","hsa_piR_018780","hsa_piR_019420","hsa_piR_020009","hsa_piR_020326","hsa_piR_020813","hsa_piR_020814","hsa_piR_020828")
subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0))
geness_res$Color <- "NS or log2FC < 2.0"
geness_res$Color[geness_res$pvalue < 0.05] <- "P < 0.05"
geness_res$Color[geness_res$padj < 0.05] <- "P-adj < 0.05"
geness_res$Color[abs(geness_res$log2FoldChange) < 2.0] <- "NS or log2FC < 2.0"

write.csv(geness_res, "untreated_vs_parental_cells_with_Category.csv")
geness_res$invert_P <- (-log10(geness_res$pvalue)) * sign(geness_res$log2FoldChange)

geness_res <- geness_res[, -1*ncol(geness_res)]
png("volcano_plot_piRNAs_untreated_vs_parental_cells.png",width=1200, height=1400)
#svg("untreated_vs_parental_cells.svg",width=12, height=14)
ggplot(geness_res,       aes(x = log2FoldChange, y = -log10(pvalue),           color = Color, label = external_gene_name)) +       geom_vline(xintercept = c(2.0, -2.0), lty = "dashed") +       geom_hline(yintercept = -log10(0.05), lty = "dashed") +       geom_point() +       labs(x = "log2(FC)", y = "Significance, -log10(P)", color = "Significance") +       scale_color_manual(values = c("P < 0.05"="orange","P-adj < 0.05"="red","NS or log2FC < 2.0"="darkgray"),guide = guide_legend(override.aes = list(size = 4))) + scale_y_continuous(expand = expansion(mult = c(0,0.05))) +       geom_text_repel(data = subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)), size = 4, point.padding = 0.15, color = "black", min.segment.length = .1, box.padding = .2, lwd = 2) +       theme_bw(base_size = 16) +       theme(legend.position = "bottom")
dev.off()

# ------------------ differentially_expressed_piRNAs_heatmap -----------------
# prepare all_genes
rld <- rlogTransformation(dds)
mat <- assay(rld)
mm <- model.matrix(~replicates, colData(rld))
mat <- limma::removeBatchEffect(mat, batch=rld$batch, design=mm)
assay(rld) <- mat
RNASeq.NoCellLine <- assay(rld)

#Option4: manully defining
#for i in untreated_vs_parental_cells; do
#  echo "cut -d',' -f1-1 ${i}-up.txt > ${i}-up.id";
#  echo "cut -d',' -f1-1 ${i}-down.txt > ${i}-down.id";
#done
#cat *.id | sort -u > ids
##add Gene_Id in the first line, delete the ""
GOI <- read.csv("ids")$Gene_Id
datamat = RNASeq.NoCellLine[GOI, ]

# clustering the genes and draw heatmap
#datamat <- datamat[,-1]  #delete the sample "control MKL1"
datamat <- datamat[, 1:5]

colnames(datamat)[1] <- "parental cells 1"
colnames(datamat)[2] <- "parental cells 2"
colnames(datamat)[3] <- "parental cells 3"
colnames(datamat)[4] <- "untreated 1"
colnames(datamat)[5] <- "untreated 2"

write.csv(datamat, file ="gene_expression_keeping_replicates.txt")
write.xlsx(datamat, file = "gene_expression_keeping_replicates.xlsx", rowNames = TRUE)
#"ward.D"’, ‘"ward.D2"’,‘"single"’, ‘"complete"’, ‘"average"’ (= UPGMA), ‘"mcquitty"’(= WPGMA), ‘"median"’ (= WPGMC) or ‘"centroid"’ (= UPGMC)
hr <- hclust(as.dist(1-cor(t(datamat), method="pearson")), method="complete")
hc <- hclust(as.dist(1-cor(datamat, method="spearman")), method="complete")
mycl = cutree(hr, h=max(hr$height)/1.1)
mycol = c("YELLOW", "BLUE", "ORANGE", "CYAN", "GREEN", "MAGENTA", "GREY", "LIGHTCYAN", "RED",     "PINK", "DARKORANGE", "MAROON",  "LIGHTGREEN", "DARKBLUE",  "DARKRED",   "LIGHTBLUE", "DARKCYAN",  "DARKGREEN", "DARKMAGENTA");
mycol = mycol[as.vector(mycl)]

rownames(datamat) <- sub("\\|.*", "", rownames(datamat))

png("differentially_expressed_piRNAs_heatmap.png", width=800, height=800)
#svg("differentially_expressed_piRNAs_heatmap.svg", width=6, height=8)
heatmap.2(as.matrix(datamat),
    Rowv=as.dendrogram(hr),
    Colv=NA,
    dendrogram='row',
    labRow=row.names(datamat),
    scale='row',
    trace='none',
    col=bluered(75),
    RowSideColors=mycol,
    srtCol=20,
    lhei=c(1,4),
    cexRow=1.7,   # Increase row label font size
    cexCol=1.7,    # Increase column label font size
    margin=c(6, 12)
    )
dev.off()

# ----------- manhattan_plot -------------

# Load the required libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(ggrepel)  # For better label positioning

# Step 1: Compute RPM from raw counts (d.raw has piRNAs in rows, samples in columns)
d.raw_5 <- d.raw[, 1:5]  # assuming 5 samples
total_counts <- colSums(d.raw_5)
RPM <- sweep(d.raw_5, 2, total_counts, FUN = "/") * 1e6

# Step 2: Prepare long-format dataframe
RPM$piRNA <- rownames(RPM)
df <- pivot_longer(RPM, cols = -piRNA, names_to = "sample", values_to = "RPM")

# Step 3: Log-transform RPM
df <- df %>%
mutate(logRPM = log10(RPM + 1))

# Step 4: Add piRNA index for x-axis positioning
df <- df %>%
arrange(piRNA) %>%
group_by(sample) %>%
mutate(Position = row_number())

# Step 5: Identify top piRNAs based on mean RPM
top_pirnas <- df %>%
group_by(piRNA) %>%
summarise(mean_RPM = mean(RPM)) %>%
arrange(desc(mean_RPM)) %>%
head(5) %>%
pull(piRNA)  # Get the names of top 5 piRNAs

# Step 6: Assign color based on whether the piRNA is top or not
df$color <- ifelse(df$piRNA %in% top_pirnas, "red", "darkblue")

# Rename the sample labels for display
sample_labels <- c(
"parental_cells_1" = "Parental cell 1",
"parental_cells_2" = "Parental cell 2",
"parental_cells_3" = "Parental cell 3",
"untreated_1"      = "Untreated 1",
"untreated_2"      = "Untreated 2"
)

# Step 7: Plot
png("manhattan_plot_top_piRNAs_based_on_mean_RPM.png", width = 1200, height = 1200)
ggplot(df, aes(x = Position, y = logRPM, color = color)) +
scale_color_manual(values = c("red" = "red", "darkblue" = "darkblue")) +
geom_jitter(width = 0.4) +
geom_text_repel(
    data = df %>% filter(piRNA %in% top_pirnas),
    aes(label = piRNA),
    box.padding = 0.5,
    point.padding = 0.5,
    segment.color = 'gray50',
    size = 5,
    max.overlaps = 8,
    color = "black"
) +
labs(x = "", y = "log10(Read Per Million) (RPM)") +
facet_wrap(~sample, scales = "free_x", ncol = 5,
            labeller = labeller(sample = sample_labels)) +
theme_minimal() +
theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    legend.position = "none",
    text = element_text(size = 16),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 16, face = "bold"),
    panel.spacing = unit(1.5, "lines")  # <-- More space between plots
)
dev.off()

top_pirnas = c("hsa_piR_012681","hsa_piR_012753","hsa_piR_001152","hsa_piR_020813","hsa_piR_020828")

# Step 6: Assign color based on whether the piRNA is top or not
df$color <- ifelse(df$piRNA %in% top_pirnas, "red", "darkblue")

# Rename the sample labels for display
sample_labels <- c(
"parental_cells_1" = "Parental cell 1",
"parental_cells_2" = "Parental cell 2",
"parental_cells_3" = "Parental cell 3",
"untreated_1"      = "Untreated 1",
"untreated_2"      = "Untreated 2"
)

# Step 7: Plot
png("manhattan_plot_most_differentially_expressed_piRNAs.png", width = 1200, height = 1200)
ggplot(df, aes(x = Position, y = logRPM, color = color)) +
scale_color_manual(values = c("red" = "red", "darkblue" = "darkblue")) +
geom_jitter(width = 0.4) +
geom_text_repel(
    data = df %>% filter(piRNA %in% top_pirnas),
    aes(label = piRNA),
    box.padding = 0.5,
    point.padding = 0.5,
    segment.color = 'gray50',
    size = 5,
    max.overlaps = 8,
    color = "black"
) +
labs(x = "", y = "log10(Read Per Million) (RPM)") +
facet_wrap(~sample, scales = "free_x", ncol = 5,
            labeller = labeller(sample = sample_labels)) +
theme_minimal() +
theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    legend.position = "none",
    text = element_text(size = 16),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 16, face = "bold"),
    panel.spacing = unit(1.5, "lines")  # <-- More space between plots
)
dev.off()

mkdir piRNAs
mv *.png piRNAs
mv *.csv piRNAs
mv *.xls* piRNAs
mv *.id piRNAs
mv ids piRNAs
mv normalized_counts.txt piRNAs
mv *-all.txt piRNAs
mv *-up.txt piRNAs
mv *-down.txt piRNAs
mv gene_expression_keeping_replicates.txt piRNAs
cd piRNAs
mv untreated_vs_parental_cells.xls piRNA_untreated_vs_parental_cells.xls

Reporting

Please find attached the analysis results for small RNAs in the WaGa cell line. miRNAs:

* Heatmap comparing untreated/wt vs. parental (1x):
See differentially_expressed_miRNAs_heatmap.png

* Volcano plot comparing untreated/wt vs. parental (1x):
See volcano_plot_miRNAs_untreated_vs_parental_cells.png

* Manhattan plots highlighting top differentially expressed miRNAs (1x):
See manhattan_plot_most_differentially_expressed_miRNAs.png and manhattan_plot_top_miRNAs_based_on_mean_RPM.png

piRNAs:

* Heatmap comparing untreated/wt vs. parental (1x):
See differentially_expressed_piRNAs_heatmap.png

* Volcano plot comparing untreated/wt vs. parental (1x):
See volcano_plot_piRNAs_untreated_vs_parental_cells.png

* Manhattan plots highlighting top differentially expressed piRNAs (1x):
See manhattan_plot_most_differentially_expressed_piRNAs.png and manhattan_plot_top_piRNAs_based_on_mean_RPM.png

Additional

* Distribution of small RNA species (untreated/wt vs. parental, 1x):
See distribution_heatmap.png

* Differential expression tables:

    - miRNA_untreated_vs_parental_cells.xls
    - piRNA_untreated_vs_parental_cells.xls

    These files contain all differentially expressed miRNAs and piRNAs, respectively.

If you’d like the R code used to generate the plots, along with the raw data and full tables, just let me know—I’ll be happy to send it over.

Processing Data_Tam_RNAseq_2024_MHB_vs_Urine_ATCC19606

Leave a reply

Preparing raw data

They are wildtype strains grown in different medium.
Urine - human urine
AUM - artificial urine medium
MHB - Mueller-Hinton broth
Urine（人类尿液）：pH值、比重、温度、污染物、化学成分、微生物负荷。
AUM（人工尿液培养基）：pH值、营养成分、无菌性、渗透压、温度、污染物。
MHB（Mueller-Hinton培养基）：pH值、无菌性、营养成分、温度、渗透压、抗生素浓度。

mkdir raw_data; cd raw_data
ln -s ../X101SC24105589-Z01-J001/01.RawData/AUM-1/AUM-1_1.fq.gz AUM_r1_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/AUM-1/AUM-1_2.fq.gz AUM_r1_R2.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/AUM-2/AUM-2_1.fq.gz AUM_r2_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/AUM-2/AUM-2_2.fq.gz AUM_r2_R2.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/AUM-3/AUM-3_1.fq.gz AUM_r3_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/AUM-3/AUM-3_2.fq.gz AUM_r3_R2.fq.gz

ln -s ../X101SC24105589-Z01-J001/01.RawData/MHB-1/MHB-1_1.fq.gz MHB_r1_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/MHB-1/MHB-1_2.fq.gz MHB_r1_R2.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/MHB-2/MHB-2_1.fq.gz MHB_r2_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/MHB-2/MHB-2_2.fq.gz MHB_r2_R2.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/MHB-3/MHB-3_1.fq.gz MHB_r3_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/MHB-3/MHB-3_2.fq.gz MHB_r3_R2.fq.gz

ln -s ../X101SC24105589-Z01-J001/01.RawData/Urine-1/Urine-1_1.fq.gz Urine_r1_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/Urine-1/Urine-1_2.fq.gz Urine_r1_R2.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/Urine-2/Urine-2_1.fq.gz Urine_r2_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/Urine-2/Urine-2_2.fq.gz Urine_r2_R2.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/Urine-3/Urine-3_1.fq.gz Urine_r3_R1.fq.gz
ln -s ../X101SC24105589-Z01-J001/01.RawData/Urine-3/Urine-3_2.fq.gz Urine_r3_R2.fq.gz

(Optional) using trinity to find the most closely reference

Trinity --seqType fq --max_memory 50G --left trimmed/wt_r1_R1.fastq.gz  --right trimmed/wt_r1_R2.fastq.gz --CPU 12

#https://www.genome.jp/kegg/tables/br08606.html#prok
acb     KGB     Acinetobacter baumannii ATCC 17978  2007    GenBank
abm     KGB     Acinetobacter baumannii SDF     2008    GenBank
aby     KGB     Acinetobacter baumannii AYE     2008    GenBank
abc     KGB     Acinetobacter baumannii ACICU   2008    GenBank
abn     KGB     Acinetobacter baumannii AB0057  2008    GenBank
abb     KGB     Acinetobacter baumannii AB307-0294  2008    GenBank
abx     KGB     Acinetobacter baumannii 1656-2  2012    GenBank
abz     KGB     Acinetobacter baumannii MDR-ZJ06    2012    GenBank
abr     KGB     Acinetobacter baumannii MDR-TJ  2012    GenBank
abd     KGB     Acinetobacter baumannii TCDC-AB0715     2012    GenBank
abh     KGB     Acinetobacter baumannii TYTH-1  2012    GenBank
abad    KGB     Acinetobacter baumannii D1279779    2013    GenBank
abj     KGB     Acinetobacter baumannii BJAB07104   2013    GenBank
abab    KGB     Acinetobacter baumannii BJAB0715    2013    GenBank
abaj    KGB     Acinetobacter baumannii BJAB0868    2013    GenBank
abaz    KGB     Acinetobacter baumannii ZW85-1  2013    GenBank
abk     KGB     Acinetobacter baumannii AbH12O-A2   2014    GenBank
abau    KGB     Acinetobacter baumannii AB030   2014    GenBank
abaa    KGB     Acinetobacter baumannii AB031   2014    GenBank
abw     KGB     Acinetobacter baumannii AC29    2014    GenBank
abal    KGB     Acinetobacter baumannii LAC-4   2015    GenBank
#Note that the Acinetobacter baumannii strain ATCC 19606 chromosome, complete genome (GenBank: CP059040.1) was choosen as reference!

Downloading CP059040.fasta and CP059040.gff from GenBank

(Optional) Preparing CP059040.fasta, CP059040_gene.gff3 and CP059040.bed

#Reference genome: https://www.ncbi.nlm.nih.gov/nuccore/CP059040
cp /media/jhuang/Elements2/Data_Tam_RNASeq3/CP059040.fasta .     # Elements (Anna C.arnes)
cp /media/jhuang/Elements2/Data_Tam_RNASeq3/CP059040_gene.gff3 .
cp /media/jhuang/Elements2/Data_Tam_RNASeq3/CP059040_gene.gtf .
cp /media/jhuang/Elements2/Data_Tam_RNASeq3/CP059040.bed .
rsync -a -P CP059040.fasta jhuang@hamm:~/DATA/Data_Tam_RNAseq_2024/
rsync -a -P CP059040_gene.gff3 jhuang@hamm:~/DATA/Data_Tam_RNAseq_2024/
rsync -a -P CP059040.bed jhuang@hamm:~/DATA/Data_Tam_RNAseq_2024/
(base) jhuang@WS-2290C:/media/jhuang/Elements2/Data_Tam_RNASeq3$ find . -name "CP059040*"
./CP059040.fasta
./CP059040.bed
./CP059040.gb
./CP059040.gff3
./CP059040.gff3_backup
./CP059040_full.gb
./CP059040_gene.gff3
./CP059040_gene.gtf
./CP059040_gene_old.gff3
./CP059040_rRNA.gff3
./CP059040_rRNA_v.gff3

# ---- REF: Acinetobacter baumannii ATCC 17978 (DEBUG, gene_name failed) ----
#gffread -E -F -T GCA_000015425.1_ASM1542v1_genomic.gff -o GCA_000015425.1_ASM1542v1_genomic.gtf_
#grep "CDS" GCA_000015425.1_ASM1542v1_genomic.gtf_ > GCA_000015425.1_ASM1542v1_genomic.gtf
#sed -i -e "s/\tCDS\t/\texon\t/g" GCA_000015425.1_ASM1542v1_genomic.gtf
#gffread -E -F --bed GCA_000015425.1_ASM1542v1_genomic.gtf -o GCA_000015425.1_ASM1542v1_genomic.bed

grep "locus_tag" GCA_000015425.1_ASM1542v1_genomic.gtf_ > GCA_000015425.1_ASM1542v1_genomic.gtf
sed -i -e "s/\ttranscript\t/\texon\t/g" GCA_000015425.1_ASM1542v1_genomic.gtf # or using fc_count_type=transcript
sed -i -e "s/\tgene_name\t/\tName\t/g" GCA_000015425.1_ASM1542v1_genomic.gtf
gffread -E -F --bed GCA_000015425.1_ASM1542v1_genomic.gtf -o GCA_000015425.1_ASM1542v1_genomic.bed
#grep "gene_name" GCA_000015425.1_ASM1542v1_genomic.gtf | wc -l  #69=3887-3803

cp CP059040.gff3 CP059040_backup.gff3
sed -i -e "s/\tGenbank\tgene\t/\tGenbank_gene\t/g" CP059040.gff3
grep "Genbank_gene" CP059040.gff3 > CP059040_gene.gff3
sed -i -e "s/\tGenbank_gene\t/\tGenbank\tgene\t/g" CP059040_gene.gff3

#3796-3754=42--> they are pseudogene since grep "pseudogene" CP059040.gff3 | wc -l = 42
# --------------------------------------------------------------------------------------------------------------------------------------------------
# ---------- PREPARING gff3 file including gene_biotype=protein_coding+gene_biotype=tRNA = total(3754)) and gene_biotype=pseudogene(42) ------------
cp CP059040.gff3 CP059040_backup.gff3
sed -i -e "s/\tGenbank\tgene\t/\tGenbank_gene\t/g" CP059040.gff3
grep "Genbank_gene" CP059040.gff3 > CP059040_gene.gff3
sed -i -e "s/\tGenbank_gene\t/\tGenbank\tgene\t/g" CP059040_gene.gff3
grep "gene_biotype=pseudogene" CP059040.gff3_backup >> CP059040_gene.gff3    #-->3796

#The whole point of the GTF format was to standardise certain aspects that are left open in GFF. Hence, there are many different valid ways to encode the same information in a valid GFF format, and any parser or converter needs to be written specifically for the choices the author of the GFF file made. For example, a GTF file requires the gene ID attribute to be called "gene_id", while in GFF files, it may be "ID", "Gene", something different, or completely missing.
# from gff3 to gtf
sed -i -e "s/\tID=gene-/\tgene_id \"/g" CP059040_gene.gtf
sed -i -e "s/;/\"; /g" CP059040_gene.gtf
sed -i -e "s/=/=\"/g" CP059040_gene.gtf

#sed -i -e "s/\n/\"\n/g" CP059040_gene.gtf
#using editor instead!

#The following is GTF-format.
CP000521.1      Genbank exon    95      1492    .       +       .       transcript_id "gene0"; gene_id "gene0"; Name "A1S_0001"; gbkey "Gene"; gene_biotype "protein_coding"; locus_tag "A1S_0001";

#NZ_MJHA01000001.1       RefSeq  region  1       8663    .       +       .       ID=id0;Dbxref=taxon:575584;Name=unnamed1;collected-by=IG Schaub;collection-date=1948;country=USA: Vancouver;culture-collection=ATCC:19606;gbkey=Src;genome=plasmid;isolation-source=urine;lat-lon=37.53 N 75.4 W;map=unlocalized;mol_type=genomic DNA;nat-host=Homo sapiens;plasmid-name=unnamed1;strain=ATCC 19606;type-material=type strain of Acinetobacter baumannii
#NZ_MJHA01000001.1       RefSeq  gene    228     746     .       -       .       ID=gene0;Name=BIT33_RS00005;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BIT33_RS00005;old_locus_tag=BIT33_18795
#NZ_MJHA01000001.1       Protein Homology        CDS     228     746     .       -       0       ID=cds0;Parent=gene0;Dbxref=Genbank:WP_000839337.1;Name=WP_000839337.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_000839337.1;product=hypothetical protein;protein_id=WP_000839337.1;transl_table=11

##gff-version 3
##sequence-region CP059040.1 1 3980852
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=470

gffread -E -F --bed CP059040.gff3 -o CP059040.bed    #-->3796
##prepare the GTF-format (see above) --> ERROR! ----> using CP059040.gff3
##stringtie adeIJ.abx_r1.sorted.bam -o adeIJ.abx_r1.sorted_transcripts.gtf -v -G /media/jhuang/Elements/Data_Tam_RNASeq3/CP059040.gff3 -A adeIJ.abx_r1.sorted.gene_abund.txt -C adeIJ.abx_r1.sorted.bam.cov_refs.gtf -e -b adeIJ.abx_r1.sorted_ballgown
#[01/21 10:57:46] Loading reference annotation (guides)..
#GFF warning: merging adjacent/overlapping segments of gene-H0N29_00815 on CP059040.1 (179715-179786, 179788-180810)
#[01/21 10:57:46] 3796 reference transcripts loaded.
#Default stack size for threads: 8388608
#WARNING: no reference transcripts found for genomic sequence "gi|1906906720|gb|CP059040.1|"! (mismatched reference names?)
#WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!
#Please make sure the -G annotation file uses the same naming convention for the genome sequences.
#[01/21 10:58:30] All threads finished.

#  ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
#  The specified gene identifier attribute is 'Name'
#  An example of attributes included in your GTF annotation is 'ID=exon-H0N29_00075-1;Parent=rna-H0N29_00075;gbkey=rRNA;locus_tag=H0N29_00075;product=16S ribosomal RNA'
#  The program has to termin

#  ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
#  The specified gene identifier attribute is 'gene_biotype'
#  An example of attributes included in your GTF annotation is 'ID=exon-H0N29_00075-1;Parent=rna-H0N29_00075;gbkey=rRNA;locus_tag=H0N29_00075;product=16S ribosomal RNA'
#  The program has to terminate.

#grep "ID=cds-" CP059040.gff3 | wc -l
#grep "ID=exon-" CP059040.gff3 | wc -l
#grep "ID=gene-" CP059040.gff3 | wc -l   #the same as H0N29_18980/5=3796
grep "gbkey=" CP059040.gff3 | wc -l  7695
grep "ID=id-" CP059040.gff3 | wc -l  5
grep "locus_tag=" CP059040.gff3 | wc -l    7689
#...
cds   3701                             locus_tag=xxxx, no gene_biotype
exon   96                              locus_tag=xxxx, no gene_biotype
gene   3796                            locus_tag=xxxx, gene_biotype=xxxx,
id  (riboswitch+direct_repeat,5)       both no --> ignoring them!!  # grep "ID=id-" CP059040.gff3
rna    96                              locus_tag=xxxx, no gene_biotype
------------------
    7694

cp CP059040.gff3_backup CP059040.gff3
grep "^##" CP059040.gff3 > CP059040_gene.gff3
grep "ID=gene" CP059040.gff3 >> CP059040_gene.gff3
#!!!!VERY_IMPORTANT!!!!: change type '\tCDS\t' to '\texon\t'!
sed -i -e "s/\tgene\t/\texon\t/g" CP059040_gene.gff3

Preparing the directory trimmed

mkdir trimmed trimmed_unpaired;
for sample_id in AUM_r1 AUM_r2 AUM_r3 Urine_r1 Urine_r2 Urine_r3 MHB_r1 MHB_r2 MHB_r3; do \
for sample_id in MHB_r1 MHB_r2 MHB_r3; do \
        java -jar /home/jhuang/Tools/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 100 raw_data/${sample_id}_R1.fq.gz raw_data/${sample_id}_R2.fq.gz trimmed/${sample_id}_R1.fq.gz trimmed_unpaired/${sample_id}_R1.fq.gz trimmed/${sample_id}_R2.fq.gz trimmed_unpaired/${sample_id}_R2.fq.gz ILLUMINACLIP:/home/jhuang/Tools/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 AVGQUAL:20; done 2> trimmomatic_pe.log;
done

Preparing samplesheet.csv

sample,fastq_1,fastq_2,strandedness
AUM_r1,AUM_r1_R1.fq.gz,AUM_r1_R2.fq.gz,auto
AUM_r2,AUM_r2_R1.fq.gz,AUM_r2_R2.fq.gz,auto
AUM_r3,AUM_r3_R1.fq.gz,AUM_r3_R2.fq.gz,auto
MHB_r1,MHB_r1_R1.fq.gz,MHB_r1_R2.fq.gz,auto
MHB_r2,MHB_r2_R1.fq.gz,MHB_r2_R2.fq.gz,auto
MHB_r3,MHB_r3_R1.fq.gz,MHB_r3_R2.fq.gz,auto
Urine_r1,Urine_r1_R1.fq.gz,Urine_r1_R2.fq.gz,auto
Urine_r2,Urine_r2_R1.fq.gz,Urine_r2_R2.fq.gz,auto
Urine_r3,Urine_r3_R1.fq.gz,Urine_r3_R2.fq.gz,auto

nextflow run

#Example1: http://xgenes.com/article/article-content/157/prepare-virus-gtf-for-nextflow-run/

docker pull nfcore/rnaseq
ln -s /home/jhuang/Tools/nf-core-rnaseq-3.12.0/ rnaseq

#Default: --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'exon'
#(host_env) !NOT_WORKING! jhuang@WS-2290C:~/DATA/Data_Tam_RNAseq_2024$ /usr/local/bin/nextflow run rnaseq/main.nf --input samplesheet.csv --outdir results    --fasta "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CP059040.fasta" --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CP059040.gff"        -profile docker -resume  --max_cpus 55 --max_memory 512.GB --max_time 2400.h    --save_align_intermeds --save_unaligned --save_reference    --aligner 'star_salmon'    --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'transcript'

# -- DEBUG_1 (CDS --> exon in CP059040.gff) --
#Checking the record (see below) in results/genome/CP059040.gtf
#In ./results/genome/CP059040.gtf e.g. "CP059040.1      Genbank transcript      1       1398    .       +       .       transcript_id "gene-H0N29_00005"; gene_id "gene-H0N29_00005"; gene_name "dnaA"; Name "dnaA"; gbkey "Gene"; gene "dnaA"; gene_biotype "protein_coding"; locus_tag "H0N29_00005";"
#--featurecounts_feature_type 'transcript' returns only the tRNA results
#Since the tRNA records have "transcript and exon". In gene records, we have "transcript and CDS". replace the CDS with exon

grep -P "\texon\t" CP059040.gff | sort | wc -l    #96
grep -P "cmsearch\texon\t" CP059040.gff | wc -l    #=10  ignal recognition particle sRNA small typ, transfer-messenger RNA, 5S ribosomal RNA
grep -P "Genbank\texon\t" CP059040.gff | wc -l    #=12  16S and 23S ribosomal RNA
grep -P "tRNAscan-SE\texon\t" CP059040.gff | wc -l    #tRNA 74
wc -l star_salmon/AUM_r3/quant.genes.sf  #--featurecounts_feature_type 'transcript' results in 96 records!

grep -P "\tCDS\t" CP059040.gff | wc -l  #3701
sed 's/\tCDS\t/\texon\t/g' CP059040.gff > CP059040_m.gff
grep -P "\texon\t" CP059040_m.gff | sort | wc -l  #3797

# -- DEBUG_2: combination of 'CP059040_m.gff' and 'exon' results in ERROR, using 'transcript' instead!
--gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CP059040_m.gff" --featurecounts_feature_type 'transcript'

# ---- SUCCESSFUL with directly downloaded gff3 and fasta from NCBI using docker after replacing 'CDS' with 'exon' ----
(host_env) /usr/local/bin/nextflow run rnaseq/main.nf --input samplesheet.csv --outdir results    --fasta "/home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine/CP059040.fasta" --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine/CP059040_m.gff"        -profile docker -resume  --max_cpus 55 --max_memory 512.GB --max_time 2400.h    --save_align_intermeds --save_unaligned --save_reference    --aligner 'star_salmon'    --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'transcript'

# -- DEBUG_3: make sure the header of fasta is the same to the *_m.gff file

Import data and pca-plot

#mamba activate r_env

#install.packages("ggfun")
# Import the required libraries
library("AnnotationDbi")
library("clusterProfiler")
library("ReactomePA")
library(gplots)
library(tximport)
library(DESeq2)
#library("org.Hs.eg.db")
library(dplyr)
library(tidyverse)
#install.packages("devtools")
#devtools::install_version("gtable", version = "0.3.0")
library(gplots)
library("RColorBrewer")
#install.packages("ggrepel")
library("ggrepel")
# install.packages("openxlsx")
library(openxlsx)
library(EnhancedVolcano)
library(DESeq2)
library(edgeR)

setwd("~/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/results/star_salmon")
# Define paths to your Salmon output quantification files
files <- c("Urine_r1" = "./Urine_r1/quant.sf",
        "Urine_r2" = "./Urine_r2/quant.sf",
        "Urine_r3" = "./Urine_r3/quant.sf",
        "MHB_r1" = "./MHB_r1/quant.sf",
        "MHB_r2" = "./MHB_r2/quant.sf",
        "MHB_r3" = "./MHB_r3/quant.sf")
# Import the transcript abundance data with tximport
txi <- tximport(files, type = "salmon", txIn = TRUE, txOut = TRUE)
# Define the replicates and condition of the samples
replicate <- factor(c("r1", "r2", "r3", "r1", "r2", "r3"))
condition <- factor(c("Urine","Urine","Urine", "MHB","MHB","MHB"))
# Define the colData for DESeq2
colData <- data.frame(condition=condition, replicate=replicate, row.names=names(files))

# ------------------------
# 1️⃣ Setup and input files
# ------------------------

# Read in transcript-to-gene mapping
tx2gene <- read.table("salmon_tx2gene.tsv", header=FALSE, stringsAsFactors=FALSE)
colnames(tx2gene) <- c("transcript_id", "gene_id", "gene_name")

# Prepare tx2gene for gene-level summarization (remove gene_name if needed)
tx2gene_geneonly <- tx2gene[, c("transcript_id", "gene_id")]

# -------------------------------
# 2️⃣ Transcript-level counts
# -------------------------------
# Create DESeqDataSet directly from tximport (transcript-level)
dds_tx <- DESeqDataSetFromTximport(txi, colData=colData, design=~condition)
write.csv(counts(dds_tx), file="transcript_counts.csv")

# --------------------------------
# 3️⃣ Gene-level summarization
# --------------------------------
# Re-import Salmon data summarized at gene level
txi_gene <- tximport(files, type="salmon", tx2gene=tx2gene_geneonly, txOut=FALSE)

# Create DESeqDataSet for gene-level counts
dds <- DESeqDataSetFromTximport(txi_gene, colData=colData, design=~condition+replicate)

# --------------------------------
# 4️⃣ Raw counts table (with gene names)
# --------------------------------
# Extract raw gene-level counts
counts_data <- as.data.frame(counts(dds, normalized=FALSE))
counts_data$gene_id <- rownames(counts_data)

# Add gene names
tx2gene_unique <- unique(tx2gene[, c("gene_id", "gene_name")])
counts_data <- merge(counts_data, tx2gene_unique, by="gene_id", all.x=TRUE)

# Reorder columns: gene_id, gene_name, then counts
count_cols <- setdiff(colnames(counts_data), c("gene_id", "gene_name"))
counts_data <- counts_data[, c("gene_id", "gene_name", count_cols)]

# --------------------------------
# 5️⃣ Calculate CPM
# --------------------------------
library(edgeR)
library(openxlsx)

# Prepare count matrix for CPM calculation
count_matrix <- as.matrix(counts_data[, !(colnames(counts_data) %in% c("gene_id", "gene_name"))])

# Calculate CPM
#cpm_matrix <- cpm(count_matrix, normalized.lib.sizes=FALSE)
total_counts <- colSums(count_matrix)
cpm_matrix <- t(t(count_matrix) / total_counts) * 1e6
cpm_matrix <- as.data.frame(cpm_matrix)

# Add gene_id and gene_name back to CPM table
cpm_counts <- cbind(counts_data[, c("gene_id", "gene_name")], cpm_matrix)

# --------------------------------
# 6️⃣ Save outputs
# --------------------------------
write.csv(counts_data, "gene_raw_counts.csv", row.names=FALSE)
write.xlsx(counts_data, "gene_raw_counts.xlsx", row.names=FALSE)
write.xlsx(cpm_counts, "gene_cpm_counts.xlsx", row.names=FALSE)

PCA dim(counts(dds)) head(counts(dds), 10) rld <- rlogTransformation(dds)

# draw simple pca and heatmap
#mat <- assay(rld)
#mm <- model.matrix(~condition, colData(rld))
#mat <- limma::removeBatchEffect(mat, batch=rld$batch, design=mm)
#assay(rld) <- mat
# -- pca --
png("pca.png", 1200, 800)
plotPCA(rld, intgroup=c("condition"))
dev.off()
# -- heatmap --
png("heatmap.png", 1200, 800)
distsRL <- dist(t(assay(rld)))
mat <- as.matrix(distsRL)
hc <- hclust(distsRL)
hmcol <- colorRampPalette(brewer.pal(9,"GnBu"))(100)
heatmap.2(mat, Rowv=as.dendrogram(hc),symm=TRUE, trace="none",col = rev(hmcol), margin=c(13, 13))
dev.off()

Select the differentially expressed genes

#https://galaxyproject.eu/posts/2020/08/22/three-steps-to-galaxify-your-tool/
#https://www.biostars.org/p/282295/
#https://www.biostars.org/p/335751/
#> dds$condition
#[1] Urine Urine Urine MHB   MHB   MHB
#Levels: MHB Urine
#CONSOLE: mkdir star_salmon/degenes

setwd("degenes")
#---- relevel to control ----
dds$condition <- relevel(dds$condition, "MHB")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("Urine_vs_MHB")

for (i in clist) {
  contrast = paste("condition", i, sep="_")
  res = results(dds, name=contrast)
  res <- res[!is.na(res$log2FoldChange),]
  res_df <- as.data.frame(res)

  write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-"))
  up <- subset(res_df, padj<=0.05 & log2FoldChange>=1.35)
  down <- subset(res_df, padj<=0.05 & log2FoldChange<=-1.35)
  write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-"))
  write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-"))
}

# -- Under host-env --
grep -P "\tgene\t" CP059040.gff > CP059040_gene.gff
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff Urine_vs_MHB-all.txt Urine_vs_MHB-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff Urine_vs_MHB-up.txt Urine_vs_MHB-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff Urine_vs_MHB-down.txt Urine_vs_MHB-down.csv

res <- read.csv("Urine_vs_MHB-all.csv")
# Replace empty GeneName with modified GeneID
res$GeneName <- ifelse(
  res$GeneName == "" | is.na(res$GeneName),
  gsub("gene-", "", res$GeneID),
  res$GeneName
)
duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]

res <- res %>%
  group_by(GeneName) %>%
  slice_min(padj, with_ties = FALSE) %>%
  ungroup()
res <- as.data.frame(res)
# Sort res first by padj (ascending) and then by log2FoldChange (descending)
res <- res[order(res$padj, -res$log2FoldChange), ]

# Assuming res is your dataframe and already processed
# Filter up-regulated genes: log2FoldChange > 2 and padj < 1e-2
up_regulated <- res[res$log2FoldChange > 2 & res$padj < 1e-2, ]
# Filter down-regulated genes: log2FoldChange < -2 and padj < 1e-2
down_regulated <- res[res$log2FoldChange < -2 & res$padj < 1e-2, ]
# Create a new workbook
wb <- createWorkbook()
# Add the complete dataset as the first sheet
addWorksheet(wb, "Complete_Data")
writeData(wb, "Complete_Data", res)
# Add the up-regulated genes as the second sheet
addWorksheet(wb, "Up_Regulated")
writeData(wb, "Up_Regulated", up_regulated)
# Add the down-regulated genes as the third sheet
addWorksheet(wb, "Down_Regulated")
writeData(wb, "Down_Regulated", down_regulated)
# Save the workbook to a file
saveWorkbook(wb, "Gene_Expression_Urine_vs_MHB.xlsx", overwrite = TRUE)

# Set the 'GeneName' column as row.names
rownames(res) <- res$GeneName
# Drop the 'GeneName' column since it's now the row names
res$GeneName <- NULL
head(res)

## Ensure the data frame matches the expected format
## For example, it should have columns: log2FoldChange, padj, etc.
#res <- as.data.frame(res)
## Remove rows with NA in log2FoldChange (if needed)
#res <- res[!is.na(res$log2FoldChange),]

# Replace padj = 0 with a small value
res$padj[res$padj == 0] <- 1e-305

#library(EnhancedVolcano)
# Assuming res is already sorted and processed
png("Urine_vs_MHB.png", width=1200, height=2000)
#max.overlaps = 10
EnhancedVolcano(res,
                lab = rownames(res),
                x = 'log2FoldChange',
                y = 'padj',
                pCutoff = 1e-2,
                FCcutoff = 2,
                title = '',
                subtitleLabSize = 18,
                pointSize = 3.0,
                labSize = 5.0,
                colAlpha = 1,
                legendIconSize = 4.0,
                drawConnectors = TRUE,
                widthConnectors = 0.5,
                colConnectors = 'black',
                subtitle = expression("Urine versus MHB"))
dev.off()

KEGG and GO annotations in non-model organisms

https://www.biobam.com/functional-analysis/

Assign KEGG and GO Terms (see diagram above)

Since your organism is non-model, standard R databases (org.Hs.eg.db, etc.) won’t work. You’ll need to manually retrieve KEGG and GO annotations.
- Preparing file 1 eggnog_out.emapper.annotations.txt for the R-code below: (KEGG Terms): EggNog based on orthology and phylogenies
  
  EggNOG-mapper assigns both KEGG Orthology (KO) IDs and GO terms.
  
  Install EggNOG-mapper:
```
mamba create -n eggnog_env python=3.8 eggnog-mapper -c conda-forge -c bioconda  #eggnog-mapper_2.1.12
mamba activate eggnog_env
```
  Run annotation:
```
#diamond makedb --in eggnog6.prots.faa -d eggnog_proteins.dmnd
mkdir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
download_eggnog_data.py --dbname eggnog.db -y --data_dir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
#NOT_WORKING: emapper.py -i CP059040_gene.fasta -o eggnog_dmnd_out --cpu 60 -m diamond[hmmer,mmseqs] --dmnd_db /home/jhuang/REFs/eggnog_data/data/eggnog_proteins.dmnd
python ~/Scripts/update_fasta_header.py CP059040_protein_.fasta CP059040_protein.fasta
emapper.py -i CP059040_protein.fasta -o eggnog_out --cpu 60 --resume
#----> result annotations.tsv: Contains KEGG, GO, and other functional annotations.
#---->  470.IX87_14445:
    * 470 likely refers to the organism or strain (e.g., Acinetobacter baumannii ATCC 19606 or another related strain).
    * IX87_14445 would refer to a specific gene or protein within that genome.
```
  Extract KEGG KO IDs from annotations.emapper.annotations.
- Preparing file 2 blast2goannot.annot2 for the R-code below:
  - Basic (GO Terms from ‘Blast2GO 5 Basic’, saved in blast2go_annot.annot): Using Blast/Diamond + Blast2GO_GUI based on sequence alignment + GO mapping
  - ‘Load protein sequences’ (Tags: NONE, generated columns: Nr, SeqName) –>
  - Buttons ‘blast’ (Tags: BLASTED, generated columns: Description, Length, #Hits, e-Value, sim mean),
  - Button ‘mapping’ (Tags: MAPPED, generated columns: #GO, GO IDs, GO Names), “Mapping finished – Please proceed now to annotation.”
  - Button ‘annot’ (Tags: ANNOTATED, generated columns: Enzyme Codes, Enzyme Names), “Annotation finished.”
    - Used parameter ‘Annotation CutOff’: The Blast2GO Annotation Rule seeks to find the most specific GO annotations with a certain level of reliability. An annotation score is calculated for each candidate GO which is composed by the sequence similarity of the Blast Hit, the evidence code of the source GO and the position of the particular GO in the Gene Ontology hierarchy. This annotation score cutoff select the most specific GO term for a given GO branch which lies above this value.
    - Used parameter ‘GO Weight’ is a value which is added to Annotation Score of a more general/abstract Gene Ontology term for each of its more specific, original source GO terms. In this case, more general GO terms which summarise many original source terms (those ones directly associated to the Blast Hits) will have a higher Annotation Score.
  - Advanced (GO Terms from ‘Blast2GO 5 Basic’): Interpro based protein families / domains –> Button interpro
  - Button ‘interpro’ (Tags: INTERPRO, generated columns: InterPro IDs, InterPro GO IDs, InterPro GO Names) –> “InterProScan Finished – You can now merge the obtained GO Annotations.”
  - MERGE the results of InterPro GO IDs (advanced) to GO IDs (basic) and generate final GO IDs, saved in blast2go_annot.annot2
  - Button ‘interpro’/’Merge InterProScan GOs to Annotation’ –> “Merge (add and validate) all GO terms retrieved via InterProScan to the already existing GO annotation.” –> “Finished merging GO terms from InterPro with annotations. Maybe you want to run ANNEX (Annotation Augmentation).”
  - (NOT_USED) Button ‘annot’/’ANNEX’ –> “ANNEX finished. Maybe you want to do the next step: Enzyme Code Mapping.”
  - PREPARING go_terms and ecterms: annot* file:
  cut -f1-2 -d$’\t’ blast2go_annot.annot2 > blast2goannot.annot2

Perform KEGG and GO Enrichment in R

    #BiocManager::install("GO.db")
    #BiocManager::install("AnnotationDbi")

    # Load required libraries
    library(openxlsx)  # For Excel file handling
    library(dplyr)     # For data manipulation
    library(tidyr)
    library(stringr)
    library(clusterProfiler)  # For KEGG and GO enrichment analysis
    #library(org.Hs.eg.db)  # Replace with appropriate organism database
    library(GO.db)
    library(AnnotationDbi)

    setwd("~/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/results/star_salmon/degenes")
    # Step 1: Load the blast2go annotation file with a check for missing columns
    annot_df <- read.table("/home/jhuang/b2gWorkspace_Tam_RNAseq_2024/blast2go_annot.annot2_",
                        header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE)

    # If the structure is inconsistent, we can make sure there are exactly 3 columns:
    colnames(annot_df) <- c("GeneID", "Term")
    # Step 2: Filter and aggregate GO and EC terms as before
    go_terms <- annot_df %>%
    filter(grepl("^GO:", Term)) %>%
    group_by(GeneID) %>%
    summarize(GOs = paste(Term, collapse = ","), .groups = "drop")
    ec_terms <- annot_df %>%
    filter(grepl("^EC:", Term)) %>%
    group_by(GeneID) %>%
    summarize(EC = paste(Term, collapse = ","), .groups = "drop")

    # Load the results
    res <- read.csv("Urine_vs_MHB-all.csv")   #up259, down138

    # Replace empty GeneName with modified GeneID
    res$GeneName <- ifelse(
        res$GeneName == "" | is.na(res$GeneName),
        gsub("gene-", "", res$GeneID),
        res$GeneName
    )

    # Remove duplicated genes by selecting the gene with the smallest padj
    duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]

    res <- res %>%
    group_by(GeneName) %>%
    slice_min(padj, with_ties = FALSE) %>%
    ungroup()

    res <- as.data.frame(res)
    # Sort res first by padj (ascending) and then by log2FoldChange (descending)
    res <- res[order(res$padj, -res$log2FoldChange), ]
    # Read eggnog annotations
    eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
    # Remove the "gene-" prefix from GeneID in res to match eggnog 'query' format
    res$GeneID <- gsub("gene-", "", res$GeneID)
    # Merge eggnog data with res based on GeneID
    res <- res %>%
    left_join(eggnog_data, by = c("GeneID" = "query"))

    # DEBUG: NOT_NECESSARY, since res has already GeneName
    ##Convert row names to a new column 'GeneName' in res
    #res_with_geneName <- res %>%
    #mutate(GeneName = rownames(res)) %>%
    #as.data.frame()  # Ensure that it's a regular data frame without row names
    ## View the result
    #head(res_with_geneName)

    # Merge with the res dataframe
    # Perform the left joins and rename columns
    res_updated <- res %>%
    left_join(go_terms, by = "GeneID") %>%
    left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)

    # DEBUG: NOT_NECESSARY, since 'GeneName' is already the first column.
    ## Reorder columns to move 'GeneName' as the first column in res_updated
    #res_updated <- res_updated %>%
    #select(GeneName, everything())

    ## Count the number of rows in the KEGG_ko, GOs, EC columns that have non-missing values
    #num_non_missing_KEGG_ko <- sum(res_updated$KEGG_ko != "-" & !is.na(res_updated$KEGG_ko))
    #print(num_non_missing_KEGG_ko)
    ##[1] 2030
    #num_non_missing_GOs <- sum(res_updated$GOs != "-" & !is.na(res_updated$GOs))
    #print(num_non_missing_GOs)
    ##[1] 2865 --> 2875
    #num_non_missing_EC <- sum(res_updated$EC != "-" & !is.na(res_updated$EC))
    #print(num_non_missing_EC)
    ##[1] 1701

    # Filter up-regulated genes
    up_regulated <- res_updated[res_updated$log2FoldChange > 2 & res_updated$padj < 0.01, ]
    # Filter down-regulated genes
    down_regulated <- res_updated[res_updated$log2FoldChange < -2 & res_updated$padj < 0.01, ]

    # Create a new workbook
    wb <- createWorkbook()
    # Add the complete dataset as the first sheet (with annotations)
    addWorksheet(wb, "Complete_Data")
    writeData(wb, "Complete_Data", res_updated)
    # Add the up-regulated genes as the second sheet (with annotations)
    addWorksheet(wb, "Up_Regulated")
    writeData(wb, "Up_Regulated", up_regulated)
    # Add the down-regulated genes as the third sheet (with annotations)
    addWorksheet(wb, "Down_Regulated")
    writeData(wb, "Down_Regulated", down_regulated)
    # Save the workbook to a file
    saveWorkbook(wb, "Gene_Expression_with_Annotations_Urine_vs_MHB.xlsx", overwrite = TRUE)

    # Set GeneName as row names after the join
    rownames(res_updated) <- res_updated$GeneName
    res_updated <- res_updated %>% dplyr::select(-GeneName)
    ## Set the 'GeneName' column as row.names
    #rownames(res_updated) <- res_updated$GeneName
    ## Drop the 'GeneName' column since it's now the row names
    #res_updated$GeneName <- NULL

    # ---- Perform KEGG enrichment analysis (up_regulated) ----
    gene_list_kegg_up <- up_regulated$KEGG_ko
    gene_list_kegg_up <- gsub("ko:", "", gene_list_kegg_up)
    kegg_enrichment_up <- enrichKEGG(gene = gene_list_kegg_up, organism = 'ko')
    # -- convert the GeneID (Kxxxxxx) to the true GeneID --
    # Step 0: Create KEGG to GeneID mapping
    kegg_to_geneid_up <- up_regulated %>%
    dplyr::select(KEGG_ko, GeneID) %>%
    filter(!is.na(KEGG_ko)) %>%  # Remove missing KEGG KO entries
    mutate(KEGG_ko = str_remove(KEGG_ko, "ko:"))  # Remove 'ko:' prefix if present
    # Step 1: Clean KEGG_ko values (separate multiple KEGG IDs)
    kegg_to_geneid_clean <- kegg_to_geneid_up %>%
    mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>%  # Remove 'ko:' prefixes
    separate_rows(KEGG_ko, sep = ",") %>%  # Ensure each KEGG ID is on its own row
    filter(KEGG_ko != "-") %>%  # Remove invalid KEGG IDs ("-")
    distinct()  # Remove any duplicate mappings
    # Step 2.1: Expand geneID column in kegg_enrichment_up
    expanded_kegg <- kegg_enrichment_up %>%
    as.data.frame() %>%
    separate_rows(geneID, sep = "/") %>%  # Split multiple KEGG IDs (Kxxxxx)
    left_join(kegg_to_geneid_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>%  # Explicitly handle many-to-many
    distinct() %>%  # Remove duplicate matches
    group_by(ID) %>%
    summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop")  # Re-collapse results
    #dplyr::glimpse(expanded_kegg)
    # Step 3.1: Replace geneID column in the original dataframe
    kegg_enrichment_up_df <- as.data.frame(kegg_enrichment_up)
    # Remove old geneID column and merge new one
    kegg_enrichment_up_df <- kegg_enrichment_up_df %>%
    dplyr::select(-geneID) %>%  # Remove old geneID column
    left_join(expanded_kegg %>% dplyr::select(ID, GeneID), by = "ID") %>%  # Merge new GeneID column
    dplyr::rename(geneID = GeneID)  # Rename column back to geneID

    # ---- Perform KEGG enrichment analysis (down_regulated) ----
    # Step 1: Extract KEGG KO terms from down-regulated genes
    gene_list_kegg_down <- down_regulated$KEGG_ko
    gene_list_kegg_down <- gsub("ko:", "", gene_list_kegg_down)
    # Step 2: Perform KEGG enrichment analysis
    kegg_enrichment_down <- enrichKEGG(gene = gene_list_kegg_down, organism = 'ko')
    # --- Convert KEGG gene IDs (Kxxxxxx) to actual GeneIDs ---
    # Step 3: Create KEGG to GeneID mapping from down_regulated dataset
    kegg_to_geneid_down <- down_regulated %>%
    dplyr::select(KEGG_ko, GeneID) %>%
    filter(!is.na(KEGG_ko)) %>%  # Remove missing KEGG KO entries
    mutate(KEGG_ko = str_remove(KEGG_ko, "ko:"))  # Remove 'ko:' prefix if present
    # Step 4: Clean KEGG_ko values (handle multiple KEGG IDs)
    kegg_to_geneid_down_clean <- kegg_to_geneid_down %>%
    mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>%  # Remove 'ko:' prefixes
    separate_rows(KEGG_ko, sep = ",") %>%  # Ensure each KEGG ID is on its own row
    filter(KEGG_ko != "-") %>%  # Remove invalid KEGG IDs ("-")
    distinct()  # Remove duplicate mappings
    # Step 5: Expand geneID column in kegg_enrichment_down
    expanded_kegg_down <- kegg_enrichment_down %>%
    as.data.frame() %>%
    separate_rows(geneID, sep = "/") %>%  # Split multiple KEGG IDs (Kxxxxx)
    left_join(kegg_to_geneid_down_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>%  # Handle many-to-many mappings
    distinct() %>%  # Remove duplicate matches
    group_by(ID) %>%
    summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop")  # Re-collapse results
    # Step 6: Replace geneID column in the original kegg_enrichment_down dataframe
    kegg_enrichment_down_df <- as.data.frame(kegg_enrichment_down) %>%
    dplyr::select(-geneID) %>%  # Remove old geneID column
    left_join(expanded_kegg_down %>% dplyr::select(ID, GeneID), by = "ID") %>%  # Merge new GeneID column
    dplyr::rename(geneID = GeneID)  # Rename column back to geneID
    # View the updated dataframe
    head(kegg_enrichment_down_df)

    # Create a new workbook
    wb <- createWorkbook()
    # Save enrichment results to the workbook
    addWorksheet(wb, "KEGG_Enrichment_Up")
    writeData(wb, "KEGG_Enrichment_Up", as.data.frame(kegg_enrichment_up_df))
    # Save enrichment results to the workbook
    addWorksheet(wb, "KEGG_Enrichment_Down")
    writeData(wb, "KEGG_Enrichment_Down", as.data.frame(kegg_enrichment_down_df))
    saveWorkbook(wb, "KEGG_Enrichment.xlsx", overwrite = TRUE)

    # ---- Perform GO enrichment analysis (TODO: extract the merged GO IDs from 'Blast2GO 5 Basic' and adapt the code below!)----

    # Define gene list (up-regulated genes)
    gene_list_go_up <- up_regulated$GeneID  # Extract the 149 up-regulated genes
    gene_list_go_down <- down_regulated$GeneID  # Extract the 65 down-regulated genes

    # Define background gene set (all genes in res)
    background_genes <- res_updated$GeneID  # Extract the 3646 background genes

    # Prepare GO annotation data from res
    go_annotation <- res_updated[, c("GOs","GeneID")]  # Extract relevant columns
    go_annotation <- go_annotation %>%
    tidyr::separate_rows(GOs, sep = ",")  # Split multiple GO terms into separate rows

    # Perform GO enrichment analysis, where pAdjustMethod is one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
    go_enrichment_up <- enricher(
        gene = gene_list_go_up,                # Up-regulated genes
        TERM2GENE = go_annotation,       # Custom GO annotation
        pvalueCutoff = 1.0,             # Significance threshold
        pAdjustMethod = "BH",
        universe = background_genes      # Define the background gene set
    )
    go_enrichment_up <- as.data.frame(go_enrichment_up)

    go_enrichment_down <- enricher(
        gene = gene_list_go_down,                # Up-regulated genes
        TERM2GENE = go_annotation,       # Custom GO annotation
        pvalueCutoff = 1.0,             # Significance threshold
        pAdjustMethod = "BH",
        universe = background_genes      # Define the background gene set
    )
    go_enrichment_down <- as.data.frame(go_enrichment_down)

    ## Remove the 'p.adjust' column since no adjusted methods have been applied!
    #go_enrichment_up <- go_enrichment_up[, !names(go_enrichment_up) %in% "p.adjust"]
    # Update the Description column with the term descriptions
    go_enrichment_up$Description <- sapply(go_enrichment_up$ID, function(go_id) {
    # Using select to get the term description
    term <- tryCatch({
        AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
    }, error = function(e) {
        message(paste("Error for GO term:", go_id))  # Print which GO ID caused the error
        return(data.frame(TERM = NA))  # In case of error, return NA
    })

    if (nrow(term) > 0) {
        return(term$TERM)
    } else {
        return(NA)  # If no description found, return NA
    }
    })
    ## Print the updated data frame
    #print(go_enrichment_up)

    ## Remove the 'p.adjust' column since no adjusted methods have been applied!
    #go_enrichment_down <- go_enrichment_down[, !names(go_enrichment_down) %in% "p.adjust"]
    # Update the Description column with the term descriptions
    go_enrichment_down$Description <- sapply(go_enrichment_down$ID, function(go_id) {
    # Using select to get the term description
    term <- tryCatch({
        AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
    }, error = function(e) {
        message(paste("Error for GO term:", go_id))  # Print which GO ID caused the error
        return(data.frame(TERM = NA))  # In case of error, return NA
    })

    if (nrow(term) > 0) {
        return(term$TERM)
    } else {
        return(NA)  # If no description found, return NA
    }
    })

    addWorksheet(wb, "GO_Enrichment_Up")
    writeData(wb, "GO_Enrichment_Up", as.data.frame(go_enrichment_up))

    addWorksheet(wb, "GO_Enrichment_Down")
    writeData(wb, "GO_Enrichment_Down", as.data.frame(go_enrichment_down))

    # Save the workbook with enrichment results
    saveWorkbook(wb, "KEGG_and_GO_Enrichments_Urine_vs_MHB.xlsx", overwrite = TRUE)

    #Error for GO term: GO:0006807: replace GO:0006807  obsolete nitrogen compound metabolic process
    #TODO: marked the color as yellow if the p.adjusted <= 0.05 in GO_enrichment!

Finalizing the KEGG and GO Enrichment table

    1. NOTE: geneIDs in KEGG_Enrichment have been already translated from ko to geneID in H0N29_*-format;
    2. NEED_MANUAL_DELETION: p.adjust values have been calculated, we have to filter all records in GO_Enrichment-results by |p.adjust|<=0.05.

Processing RNAseq_2025_WT_vs_ΔIJ_on_ATCC19606

Leave a reply

Vorgabe

#perform PCA analysis, Venn diagram analysis, as well as KEGG and GO annotations. We would also appreciate it if you could include CPM calculations for this dataset (gene_cpm_counts.xlsx). For comparative analysis, we are particularly interested in identifying DEGs between WT and ΔIJ across the different treatments and time points.

Preparing raw data

mkdir raw_data; cd raw_data
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-17-1/WT-17-1_1.fq.gz WT-17-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-17-1/WT-17-1_2.fq.gz WT-17-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-17-2/WT-17-2_1.fq.gz WT-17-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-17-2/WT-17-2_2.fq.gz WT-17-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-17-3/WT-17-3_1.fq.gz WT-17-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-17-3/WT-17-3_2.fq.gz WT-17-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-24-1/WT-24-1_1.fq.gz WT-24-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-24-1/WT-24-1_2.fq.gz WT-24-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-24-2/WT-24-2_1.fq.gz WT-24-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-24-2/WT-24-2_2.fq.gz WT-24-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-24-3/WT-24-3_1.fq.gz WT-24-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT-24-3/WT-24-3_2.fq.gz WT-24-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-17-1/ΔIJ-17-1_1.fq.gz deltaIJ-17-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-17-1/ΔIJ-17-1_2.fq.gz deltaIJ-17-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-17-2/ΔIJ-17-2_1.fq.gz deltaIJ-17-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-17-2/ΔIJ-17-2_2.fq.gz deltaIJ-17-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-17-3/ΔIJ-17-3_1.fq.gz deltaIJ-17-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-17-3/ΔIJ-17-3_2.fq.gz deltaIJ-17-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-24-1/ΔIJ-24-1_1.fq.gz deltaIJ-24-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-24-1/ΔIJ-24-1_2.fq.gz deltaIJ-24-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-24-2/ΔIJ-24-2_1.fq.gz deltaIJ-24-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-24-2/ΔIJ-24-2_2.fq.gz deltaIJ-24-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-24-3/ΔIJ-24-3_1.fq.gz deltaIJ-24-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/ΔIJ-24-3/ΔIJ-24-3_2.fq.gz deltaIJ-24-r3_R2.fq.gz

ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-17-1/preWT-17-1_1.fq.gz pre_WT-17-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-17-1/preWT-17-1_2.fq.gz pre_WT-17-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-17-2/preWT-17-2_1.fq.gz pre_WT-17-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-17-2/preWT-17-2_2.fq.gz pre_WT-17-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-17-3/preWT-17-3_1.fq.gz pre_WT-17-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-17-3/preWT-17-3_2.fq.gz pre_WT-17-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-24-1/preWT-24-1_1.fq.gz pre_WT-24-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-24-1/preWT-24-1_2.fq.gz pre_WT-24-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-24-2/preWT-24-2_1.fq.gz pre_WT-24-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-24-2/preWT-24-2_2.fq.gz pre_WT-24-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-24-3/preWT-24-3_1.fq.gz pre_WT-24-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preWT-24-3/preWT-24-3_2.fq.gz pre_WT-24-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-17-1/preΔIJ-17-1_1.fq.gz pre_deltaIJ-17-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-17-1/preΔIJ-17-1_2.fq.gz pre_deltaIJ-17-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-17-2/preΔIJ-17-2_1.fq.gz pre_deltaIJ-17-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-17-2/preΔIJ-17-2_2.fq.gz pre_deltaIJ-17-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-17-3/preΔIJ-17-3_1.fq.gz pre_deltaIJ-17-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-17-3/preΔIJ-17-3_2.fq.gz pre_deltaIJ-17-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-24-1/preΔIJ-24-1_1.fq.gz pre_deltaIJ-24-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-24-1/preΔIJ-24-1_2.fq.gz pre_deltaIJ-24-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-24-2/preΔIJ-24-2_1.fq.gz pre_deltaIJ-24-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-24-2/preΔIJ-24-2_2.fq.gz pre_deltaIJ-24-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-24-3/preΔIJ-24-3_1.fq.gz pre_deltaIJ-24-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/preΔIJ-24-3/preΔIJ-24-3_2.fq.gz pre_deltaIJ-24-r3_R2.fq.gz

ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-17-1/WT0_5-17-1_1.fq.gz 0_5_WT-17-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-17-1/WT0_5-17-1_2.fq.gz 0_5_WT-17-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-17-2/WT0_5-17-2_1.fq.gz 0_5_WT-17-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-17-2/WT0_5-17-2_2.fq.gz 0_5_WT-17-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-17-3/WT0_5-17-3_1.fq.gz 0_5_WT-17-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-17-3/WT0_5-17-3_2.fq.gz 0_5_WT-17-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-24-1/WT0_5-24-1_1.fq.gz 0_5_WT-24-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-24-1/WT0_5-24-1_2.fq.gz 0_5_WT-24-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-24-2/WT0_5-24-2_1.fq.gz 0_5_WT-24-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-24-2/WT0_5-24-2_2.fq.gz 0_5_WT-24-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-24-3/WT0_5-24-3_1.fq.gz 0_5_WT-24-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/WT0_5-24-3/WT0_5-24-3_2.fq.gz 0_5_WT-24-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-17-1/0_5ΔIJ-17-1_1.fq.gz 0_5_deltaIJ-17-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-17-1/0_5ΔIJ-17-1_2.fq.gz 0_5_deltaIJ-17-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-17-2/0_5ΔIJ-17-2_1.fq.gz 0_5_deltaIJ-17-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-17-2/0_5ΔIJ-17-2_2.fq.gz 0_5_deltaIJ-17-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-17-3/0_5ΔIJ-17-3_1.fq.gz 0_5_deltaIJ-17-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-17-3/0_5ΔIJ-17-3_2.fq.gz 0_5_deltaIJ-17-r3_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-24-1/0_5ΔIJ-24-1_1.fq.gz 0_5_deltaIJ-24-r1_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-24-1/0_5ΔIJ-24-1_2.fq.gz 0_5_deltaIJ-24-r1_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-24-2/0_5ΔIJ-24-2_1.fq.gz 0_5_deltaIJ-24-r2_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-24-2/0_5ΔIJ-24-2_2.fq.gz 0_5_deltaIJ-24-r2_R2.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-24-3/0_5ΔIJ-24-3_1.fq.gz 0_5_deltaIJ-24-r3_R1.fq.gz
ln -s ../RSMR00204/X101SC25062155-Z01/X101SC25062155-Z01-J001/01.RawData/0_5ΔIJ-24-3/0_5ΔIJ-24-3_2.fq.gz 0_5_deltaIJ-24-r3_R2.fq.gz

(Done) Downloading CP059040.fasta and CP059040.gff from GenBank

Preparing the directory trimmed

mkdir trimmed trimmed_unpaired;
for sample_id in WT-17-r1 WT-17-r2 WT-17-r3 WT-24-r1 WT-24-r2 WT-24-r3 deltaIJ-17-r1 deltaIJ-17-r2 deltaIJ-17-r3 deltaIJ-24-r1 deltaIJ-24-r2 deltaIJ-24-r3  pre_WT-17-r1 pre_WT-17-r2 pre_WT-17-r3 pre_WT-24-r1 pre_WT-24-r2 pre_WT-24-r3 pre_deltaIJ-17-r1 pre_deltaIJ-17-r2 pre_deltaIJ-17-r3 pre_deltaIJ-24-r1 pre_deltaIJ-24-r2 pre_deltaIJ-24-r3  0_5_WT-17-r1 0_5_WT-17-r2 0_5_WT-17-r3 0_5_WT-24-r1 0_5_WT-24-r2 0_5_WT-24-r3 0_5_deltaIJ-17-r1 0_5_deltaIJ-17-r2 0_5_deltaIJ-17-r3 0_5_deltaIJ-24-r1 0_5_deltaIJ-24-r2 0_5_deltaIJ-24-r3; do \
        java -jar /home/jhuang/Tools/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 100 raw_data/${sample_id}_R1.fq.gz raw_data/${sample_id}_R2.fq.gz trimmed/${sample_id}_R1.fq.gz trimmed_unpaired/${sample_id}_R1.fq.gz trimmed/${sample_id}_R2.fq.gz trimmed_unpaired/${sample_id}_R2.fq.gz ILLUMINACLIP:/home/jhuang/Tools/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 AVGQUAL:20; done 2> trimmomatic_pe.log;
done

Preparing samplesheet.csv

sample,fastq_1,fastq_2,strandedness
WT_17_r1,WT-17-r1_R1.fq.gz,WT-17-r1_R2.fq.gz,auto
WT_17_r2,WT-17-r2_R1.fq.gz,WT-17-r2_R2.fq.gz,auto
WT_17_r3,WT-17-r3_R1.fq.gz,WT-17-r3_R2.fq.gz,auto
WT_24_r1,WT-24-r1_R1.fq.gz,WT-24-r1_R2.fq.gz,auto
WT_24_r2,WT-24-r2_R1.fq.gz,WT-24-r2_R2.fq.gz,auto
WT_24_r3,WT-24-r3_R1.fq.gz,WT-24-r3_R2.fq.gz,auto
deltaIJ_17_r1,deltaIJ-17-r1_R1.fq.gz,deltaIJ-17-r1_R2.fq.gz,auto
deltaIJ_17_r2,deltaIJ-17-r2_R1.fq.gz,deltaIJ-17-r2_R2.fq.gz,auto
deltaIJ_17_r3,deltaIJ-17-r3_R1.fq.gz,deltaIJ-17-r3_R2.fq.gz,auto
deltaIJ_24_r1,deltaIJ-24-r1_R1.fq.gz,deltaIJ-24-r1_R2.fq.gz,auto
deltaIJ_24_r2,deltaIJ-24-r2_R1.fq.gz,deltaIJ-24-r2_R2.fq.gz,auto
deltaIJ_24_r3,deltaIJ-24-r3_R1.fq.gz,deltaIJ-24-r3_R2.fq.gz,auto
pre_WT_17_r1,pre_WT-17-r1_R1.fq.gz,pre_WT-17-r1_R2.fq.gz,auto
pre_WT_17_r2,pre_WT-17-r2_R1.fq.gz,pre_WT-17-r2_R2.fq.gz,auto
pre_WT_17_r3,pre_WT-17-r3_R1.fq.gz,pre_WT-17-r3_R2.fq.gz,auto
pre_WT_24_r1,pre_WT-24-r1_R1.fq.gz,pre_WT-24-r1_R2.fq.gz,auto
pre_WT_24_r2,pre_WT-24-r2_R1.fq.gz,pre_WT-24-r2_R2.fq.gz,auto
pre_WT_24_r3,pre_WT-24-r3_R1.fq.gz,pre_WT-24-r3_R2.fq.gz,auto
pre_deltaIJ_17_r1,pre_deltaIJ-17-r1_R1.fq.gz,pre_deltaIJ-17-r1_R2.fq.gz,auto
pre_deltaIJ_17_r2,pre_deltaIJ-17-r2_R1.fq.gz,pre_deltaIJ-17-r2_R2.fq.gz,auto
pre_deltaIJ_17_r3,pre_deltaIJ-17-r3_R1.fq.gz,pre_deltaIJ-17-r3_R2.fq.gz,auto
pre_deltaIJ_24_r1,pre_deltaIJ-24-r1_R1.fq.gz,pre_deltaIJ-24-r1_R2.fq.gz,auto
pre_deltaIJ_24_r2,pre_deltaIJ-24-r2_R1.fq.gz,pre_deltaIJ-24-r2_R2.fq.gz,auto
pre_deltaIJ_24_r3,pre_deltaIJ-24-r3_R1.fq.gz,pre_deltaIJ-24-r3_R2.fq.gz,auto
0_5_WT_17_r1,0_5_WT-17-r1_R1.fq.gz,0_5_WT-17-r1_R2.fq.gz,auto
0_5_WT_17_r2,0_5_WT-17-r2_R1.fq.gz,0_5_WT-17-r2_R2.fq.gz,auto
0_5_WT_17_r3,0_5_WT-17-r3_R1.fq.gz,0_5_WT-17-r3_R2.fq.gz,auto
0_5_WT_24_r1,0_5_WT-24-r1_R1.fq.gz,0_5_WT-24-r1_R2.fq.gz,auto
0_5_WT_24_r2,0_5_WT-24-r2_R1.fq.gz,0_5_WT-24-r2_R2.fq.gz,auto
0_5_WT_24_r3,0_5_WT-24-r3_R1.fq.gz,0_5_WT-24-r3_R2.fq.gz,auto
0_5_deltaIJ_17_r1,0_5_deltaIJ-17-r1_R1.fq.gz,0_5_deltaIJ-17-r1_R2.fq.gz,auto
0_5_deltaIJ_17_r2,0_5_deltaIJ-17-r2_R1.fq.gz,0_5_deltaIJ-17-r2_R2.fq.gz,auto
0_5_deltaIJ_17_r3,0_5_deltaIJ-17-r3_R1.fq.gz,0_5_deltaIJ-17-r3_R2.fq.gz,auto
0_5_deltaIJ_24_r1,0_5_deltaIJ-24-r1_R1.fq.gz,0_5_deltaIJ-24-r1_R2.fq.gz,auto
0_5_deltaIJ_24_r2,0_5_deltaIJ-24-r2_R1.fq.gz,0_5_deltaIJ-24-r2_R2.fq.gz,auto
0_5_deltaIJ_24_r3,0_5_deltaIJ-24-r3_R1.fq.gz,0_5_deltaIJ-24-r3_R2.fq.gz,auto

nextflow run

#Example1: http://xgenes.com/article/article-content/157/prepare-virus-gtf-for-nextflow-run/

docker pull nfcore/rnaseq
ln -s /home/jhuang/Tools/nf-core-rnaseq-3.12.0/ rnaseq

#Default: --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'exon'
#(host_env) !NOT_WORKING! jhuang@WS-2290C:~/DATA/Data_Tam_RNAseq_2024$ /usr/local/bin/nextflow run rnaseq/main.nf --input samplesheet.csv --outdir results    --fasta "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CP059040.fasta" --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CP059040.gff"        -profile docker -resume  --max_cpus 55 --max_memory 512.GB --max_time 2400.h    --save_align_intermeds --save_unaligned --save_reference    --aligner 'star_salmon'    --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'transcript'

# -- DEBUG_1 (CDS --> exon in CP059040.gff) --
#Checking the record (see below) in results/genome/CP059040.gtf
#In ./results/genome/CP059040.gtf e.g. "CP059040.1      Genbank transcript      1       1398    .       +       .       transcript_id "gene-H0N29_00005"; gene_id "gene-H0N29_00005"; gene_name "dnaA"; Name "dnaA"; gbkey "Gene"; gene "dnaA"; gene_biotype "protein_coding"; locus_tag "H0N29_00005";"
#--featurecounts_feature_type 'transcript' returns only the tRNA results
#Since the tRNA records have "transcript and exon". In gene records, we have "transcript and CDS". replace the CDS with exon

grep -P "\texon\t" CP059040.gff | sort | wc -l    #96
grep -P "cmsearch\texon\t" CP059040.gff | wc -l    #=10  ignal recognition particle sRNA small typ, transfer-messenger RNA, 5S ribosomal RNA
grep -P "Genbank\texon\t" CP059040.gff | wc -l    #=12  16S and 23S ribosomal RNA
grep -P "tRNAscan-SE\texon\t" CP059040.gff | wc -l    #tRNA 74
wc -l star_salmon/AUM_r3/quant.genes.sf  #--featurecounts_feature_type 'transcript' results in 96 records!

grep -P "\tCDS\t" CP059040.gff | wc -l  #3701
sed 's/\tCDS\t/\texon\t/g' CP059040.gff > CP059040_m.gff
grep -P "\texon\t" CP059040_m.gff | sort | wc -l  #3797

# -- DEBUG_2: combination of 'CP059040_m.gff' and 'exon' results in ERROR, using 'transcript' instead!
--gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024/CP059040_m.gff" --featurecounts_feature_type 'transcript'

# ---- SUCCESSFUL with directly downloaded gff3 and fasta from NCBI using docker after replacing 'CDS' with 'exon' ----
mv trimmed/*.fq.gz .; rmdir trimmed
(host_env) /usr/local/bin/nextflow run rnaseq/main.nf --input samplesheet.csv --outdir results    --fasta "/home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040.fasta" --gff "/home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_m.gff"        -profile docker -resume  --max_cpus 90 --max_memory 900.GB --max_time 2400.h    --save_align_intermeds --save_unaligned --save_reference    --aligner 'star_salmon'    --gtf_group_features 'gene_id'  --gtf_extra_attributes 'gene_name' --featurecounts_group_type 'gene_biotype' --featurecounts_feature_type 'transcript'

# -- DEBUG_3: make sure the header of fasta is the same to the *_m.gff file

Import data and pca-plot

#mamba activate r_env

#install.packages("ggfun")
# Import the required libraries
library("AnnotationDbi")
library("clusterProfiler")
library("ReactomePA")
library(gplots)
library(tximport)
library(DESeq2)
#library("org.Hs.eg.db")
library(dplyr)
library(tidyverse)
#install.packages("devtools")
#devtools::install_version("gtable", version = "0.3.0")
library(gplots)
library("RColorBrewer")
#install.packages("ggrepel")
library("ggrepel")
# install.packages("openxlsx")
library(openxlsx)
library(EnhancedVolcano)
library(DESeq2)
library(edgeR)

setwd("~/DATA/Data_Tam_RNAseq_2025_WT_deltaIJ_ATCC19606/results/star_salmon")
# Define paths to your Salmon output quantification files
files <- c("WT_17_r1" = "./WT_17_r1/quant.sf",
           "WT_17_r2" = "./WT_17_r2/quant.sf",
           "WT_17_r3" = "./WT_17_r3/quant.sf",
           "WT_24_r1" = "./WT_24_r1/quant.sf",
           "WT_24_r2" = "./WT_24_r2/quant.sf",
           "WT_24_r3" = "./WT_24_r3/quant.sf",
           "deltaIJ_17_r1" = "./deltaIJ_17_r1/quant.sf",
           "deltaIJ_17_r2" = "./deltaIJ_17_r2/quant.sf",
           "deltaIJ_17_r3" = "./deltaIJ_17_r3/quant.sf",
           "deltaIJ_24_r1" = "./deltaIJ_24_r1/quant.sf",
           "deltaIJ_24_r2" = "./deltaIJ_24_r2/quant.sf",
           "deltaIJ_24_r3" = "./deltaIJ_24_r3/quant.sf",
           "pre_WT_17_r1" = "./pre_WT_17_r1/quant.sf",
           "pre_WT_17_r2" = "./pre_WT_17_r2/quant.sf",
           "pre_WT_17_r3" = "./pre_WT_17_r3/quant.sf",
           "pre_WT_24_r1" = "./pre_WT_24_r1/quant.sf",
           "pre_WT_24_r2" = "./pre_WT_24_r2/quant.sf",
           "pre_WT_24_r3" = "./pre_WT_24_r3/quant.sf",
           "pre_deltaIJ_17_r1" = "./pre_deltaIJ_17_r1/quant.sf",
           "pre_deltaIJ_17_r2" = "./pre_deltaIJ_17_r2/quant.sf",
           "pre_deltaIJ_17_r3" = "./pre_deltaIJ_17_r3/quant.sf",
           "pre_deltaIJ_24_r1" = "./pre_deltaIJ_24_r1/quant.sf",
           "pre_deltaIJ_24_r2" = "./pre_deltaIJ_24_r2/quant.sf",
           "pre_deltaIJ_24_r3" = "./pre_deltaIJ_24_r3/quant.sf",
           "0_5_WT_17_r1" = "./0_5_WT_17_r1/quant.sf",
           "0_5_WT_17_r2" = "./0_5_WT_17_r2/quant.sf",
           "0_5_WT_17_r3" = "./0_5_WT_17_r3/quant.sf",
           "0_5_WT_24_r1" = "./0_5_WT_24_r1/quant.sf",
           "0_5_WT_24_r2" = "./0_5_WT_24_r2/quant.sf",
           "0_5_WT_24_r3" = "./0_5_WT_24_r3/quant.sf",
           "0_5_deltaIJ_17_r1" = "./0_5_deltaIJ_17_r1/quant.sf",
           "0_5_deltaIJ_17_r2" = "./0_5_deltaIJ_17_r2/quant.sf",
           "0_5_deltaIJ_17_r3" = "./0_5_deltaIJ_17_r3/quant.sf",
           "0_5_deltaIJ_24_r1" = "./0_5_deltaIJ_24_r1/quant.sf",
           "0_5_deltaIJ_24_r2" = "./0_5_deltaIJ_24_r2/quant.sf",
           "0_5_deltaIJ_24_r3" = "./0_5_deltaIJ_24_r3/quant.sf")
# Import the transcript abundance data with tximport
txi <- tximport(files, type = "salmon", txIn = TRUE, txOut = TRUE)
# Define the replicates and condition of the samples
replicate <- factor(c("r1", "r2", "r3", "r1", "r2", "r3", "r1", "r2", "r3", "r1", "r2", "r3",     "r1", "r2", "r3", "r1", "r2", "r3", "r1", "r2", "r3", "r1", "r2", "r3",      "r1", "r2", "r3", "r1", "r2", "r3", "r1", "r2", "r3", "r1", "r2", "r3"))
condition <- factor(c("WT_17","WT_17","WT_17","WT_24","WT_24","WT_24", "deltaIJ_17","deltaIJ_17","deltaIJ_17","deltaIJ_24","deltaIJ_24","deltaIJ_24",   "pre_WT_17","pre_WT_17","pre_WT_17","pre_WT_24","pre_WT_24","pre_WT_24", "pre_deltaIJ_17","pre_deltaIJ_17","pre_deltaIJ_17","pre_deltaIJ_24","pre_deltaIJ_24","pre_deltaIJ_24",   "0_5_WT_17","0_5_WT_17","0_5_WT_17","0_5_WT_24","0_5_WT_24","0_5_WT_24", "0_5_deltaIJ_17","0_5_deltaIJ_17","0_5_deltaIJ_17","0_5_deltaIJ_24","0_5_deltaIJ_24","0_5_deltaIJ_24"))
# Define the colData for DESeq2
colData <- data.frame(condition=condition, replicate=replicate, row.names=names(files))

# ------------------------
# 1️⃣ Setup and input files
# ------------------------

# Read in transcript-to-gene mapping
tx2gene <- read.table("salmon_tx2gene.tsv", header=FALSE, stringsAsFactors=FALSE)
colnames(tx2gene) <- c("transcript_id", "gene_id", "gene_name")

# Prepare tx2gene for gene-level summarization (remove gene_name if needed)
tx2gene_geneonly <- tx2gene[, c("transcript_id", "gene_id")]

# -------------------------------
# 2️⃣ Transcript-level counts
# -------------------------------
# Create DESeqDataSet directly from tximport (transcript-level)
dds_tx <- DESeqDataSetFromTximport(txi, colData=colData, design=~condition)
write.csv(counts(dds_tx), file="transcript_counts.csv")

# --------------------------------
# 3️⃣ Gene-level summarization
# --------------------------------
# Re-import Salmon data summarized at gene level
txi_gene <- tximport(files, type="salmon", tx2gene=tx2gene_geneonly, txOut=FALSE)

# Create DESeqDataSet for gene-level counts
dds <- DESeqDataSetFromTximport(txi_gene, colData=colData, design=~condition+replicate)

# --------------------------------
# 4️⃣ Raw counts table (with gene names)
# --------------------------------
# Extract raw gene-level counts
counts_data <- as.data.frame(counts(dds, normalized=FALSE))
counts_data$gene_id <- rownames(counts_data)

# Add gene names
tx2gene_unique <- unique(tx2gene[, c("gene_id", "gene_name")])
counts_data <- merge(counts_data, tx2gene_unique, by="gene_id", all.x=TRUE)

# Reorder columns: gene_id, gene_name, then counts
count_cols <- setdiff(colnames(counts_data), c("gene_id", "gene_name"))
counts_data <- counts_data[, c("gene_id", "gene_name", count_cols)]

# --------------------------------
# 5️⃣ Calculate CPM
# --------------------------------
library(edgeR)
library(openxlsx)

# Prepare count matrix for CPM calculation
count_matrix <- as.matrix(counts_data[, !(colnames(counts_data) %in% c("gene_id", "gene_name"))])

# Calculate CPM
#cpm_matrix <- cpm(count_matrix, normalized.lib.sizes=FALSE)
total_counts <- colSums(count_matrix)
cpm_matrix <- t(t(count_matrix) / total_counts) * 1e6
cpm_matrix <- as.data.frame(cpm_matrix)

# Add gene_id and gene_name back to CPM table
cpm_counts <- cbind(counts_data[, c("gene_id", "gene_name")], cpm_matrix)

# --------------------------------
# 6️⃣ Save outputs (CPM calculations required to send!)
# --------------------------------
write.csv(counts_data, "gene_raw_counts.csv", row.names=FALSE)
write.xlsx(counts_data, "gene_raw_counts.xlsx", row.names=FALSE)
write.xlsx(cpm_counts, "gene_cpm_counts.xlsx", row.names=FALSE)

PCA

dim(counts(dds))
head(counts(dds), 10)

library(DESeq2)
library(RColorBrewer)
library(gplots)
library(ggplot2)

# Load or generate DESeqDataSet object: dds
# dds <- DESeqDataSetFromMatrix(...)  # <- already assumed

# Apply rlog transformation
rld <- rlogTransformation(dds)

# Define condition names in correct order
condition <- factor(c(
  "WT_17","WT_17","WT_17",
  "WT_24","WT_24","WT_24",
  "deltaIJ_17","deltaIJ_17","deltaIJ_17",
  "deltaIJ_24","deltaIJ_24","deltaIJ_24",
  "pre_WT_17","pre_WT_17","pre_WT_17",
  "pre_WT_24","pre_WT_24","pre_WT_24",
  "pre_deltaIJ_17","pre_deltaIJ_17","pre_deltaIJ_17",
  "pre_deltaIJ_24","pre_deltaIJ_24","pre_deltaIJ_24",
  "0_5_WT_17","0_5_WT_17","0_5_WT_17",
  "0_5_WT_24","0_5_WT_24","0_5_WT_24",
  "0_5_deltaIJ_17","0_5_deltaIJ_17","0_5_deltaIJ_17",
  "0_5_deltaIJ_24","0_5_deltaIJ_24","0_5_deltaIJ_24"
))

# Replace with descriptive condition names
condition <- factor(condition,
  levels = c(
    "WT_17", "deltaIJ_17", "WT_24", "deltaIJ_24",
    "pre_WT_17", "pre_deltaIJ_17", "pre_WT_24", "pre_deltaIJ_24",
    "0_5_WT_17", "0_5_deltaIJ_17", "0_5_WT_24", "0_5_deltaIJ_24"
  ),
  labels = c(
    "WT-17", "ΔIJ-17",
    "WT-24", "ΔIJ-24",
    "preWT-17", "preΔIJ-17",
    "preWT-24", "preΔIJ-24",
    "0_5WT-17", "0_5ΔIJ-17",
    "0_5WT-24", "0_5ΔIJ-24"
  )
)

# Assign to rld
colData(rld)$condition <- condition

# Define colors (12 distinct ones)
condition_colors <- c(
  "#1f78b4", "#33a02c", "#a6cee3", "#b2df8a",
  "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00",
  "#cab2d6", "#6a3d9a", "#ffff99", "#b15928"
)

names(condition_colors) <- levels(condition)

# Plot PCA
png("pca_colored.png", width=1200, height=800)
pcaData <- plotPCA(rld, intgroup="condition", returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))

ggplot(pcaData, aes(PC1, PC2, color=condition)) +
  geom_point(size=4) +
  scale_color_manual(values=condition_colors) +
  xlab(paste0("PC1: ", percentVar[1], "% variance")) +
  ylab(paste0("PC2: ", percentVar[2], "% variance")) +
  theme_bw() +
  theme(axis.text = element_text(size=12), legend.text = element_text(size=10))
dev.off()

# Heatmap of sample distances
png("heatmap.png", width=1200, height=800)
distsRL <- dist(t(assay(rld)))
mat <- as.matrix(distsRL)
hc <- hclust(distsRL)
hmcol <- colorRampPalette(brewer.pal(9,"GnBu"))(100)
heatmap.2(mat, Rowv=as.dendrogram(hc), Colv=as.dendrogram(hc),
          trace="none", symm=TRUE, col=rev(hmcol),
          margin=c(13, 13), labRow=condition, labCol=condition)
dev.off()

Select the differentially expressed genes

#https://galaxyproject.eu/posts/2020/08/22/three-steps-to-galaxify-your-tool/
#https://www.biostars.org/p/282295/
#https://www.biostars.org/p/335751/
#> dds$condition
#CONSOLE: mkdir star_salmon/degenes

setwd("degenes")
#---- relevel to control ----
dds$condition <- relevel(dds$condition, "WT_17")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("deltaIJ_17_vs_WT_17")

dds$condition <- relevel(dds$condition, "WT_24")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("deltaIJ_24_vs_WT_24")

dds$condition <- relevel(dds$condition, "pre_WT_17")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("pre_deltaIJ_17_vs_pre_WT_17")

dds$condition <- relevel(dds$condition, "pre_WT_24")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("pre_deltaIJ_24_vs_pre_WT_24")

dds$condition <- relevel(dds$condition, "0_5_WT_17")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("0_5_deltaIJ_17_vs_0_5_WT_17")

dds$condition <- relevel(dds$condition, "0_5_WT_24")
dds = DESeq(dds, betaPrior=FALSE)
resultsNames(dds)
clist <- c("0_5_deltaIJ_24_vs_0_5_WT_24")

for (i in clist) {
  contrast = paste("condition", i, sep="_")
  res = results(dds, name=contrast)
  res <- res[!is.na(res$log2FoldChange),]
  res_df <- as.data.frame(res)

  write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-"))
  up <- subset(res_df, padj<=0.05 & log2FoldChange>=2)
  down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2)
  write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-"))
  write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-"))
}

# -- Under host-env --
grep -P "\tgene\t" CP059040.gff > CP059040_gene.gff
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff deltaIJ_17_vs_WT_17-all.txt deltaIJ_17_vs_WT_17-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff deltaIJ_17_vs_WT_17-up.txt deltaIJ_17_vs_WT_17-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff deltaIJ_17_vs_WT_17-down.txt deltaIJ_17_vs_WT_17-down.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff deltaIJ_24_vs_WT_24-all.txt deltaIJ_24_vs_WT_24-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff deltaIJ_24_vs_WT_24-up.txt deltaIJ_24_vs_WT_24-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff deltaIJ_24_vs_WT_24-down.txt deltaIJ_24_vs_WT_24-down.csv

python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff pre_deltaIJ_17_vs_pre_WT_17-all.txt pre_deltaIJ_17_vs_pre_WT_17-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff pre_deltaIJ_17_vs_pre_WT_17-up.txt pre_deltaIJ_17_vs_pre_WT_17-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff pre_deltaIJ_17_vs_pre_WT_17-down.txt pre_deltaIJ_17_vs_pre_WT_17-down.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff pre_deltaIJ_24_vs_pre_WT_24-all.txt pre_deltaIJ_24_vs_pre_WT_24-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff pre_deltaIJ_24_vs_pre_WT_24-up.txt pre_deltaIJ_24_vs_pre_WT_24-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff pre_deltaIJ_24_vs_pre_WT_24-down.txt pre_deltaIJ_24_vs_pre_WT_24-down.csv

python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff 0_5_deltaIJ_17_vs_0_5_WT_17-all.txt 0_5_deltaIJ_17_vs_0_5_WT_17-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff 0_5_deltaIJ_17_vs_0_5_WT_17-up.txt 0_5_deltaIJ_17_vs_0_5_WT_17-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff 0_5_deltaIJ_17_vs_0_5_WT_17-down.txt 0_5_deltaIJ_17_vs_0_5_WT_17-down.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff 0_5_deltaIJ_24_vs_0_5_WT_24-all.txt 0_5_deltaIJ_24_vs_0_5_WT_24-all.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff 0_5_deltaIJ_24_vs_0_5_WT_24-up.txt 0_5_deltaIJ_24_vs_0_5_WT_24-up.csv
python3 ~/Scripts/replace_gene_names.py /home/jhuang/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/CP059040_gene.gff 0_5_deltaIJ_24_vs_0_5_WT_24-down.txt 0_5_deltaIJ_24_vs_0_5_WT_24-down.csv

res <- read.csv("deltaIJ_17_vs_WT_17-all.csv")
res <- read.csv("deltaIJ_24_vs_WT_24-all.csv")
res <- read.csv("pre_deltaIJ_17_vs_pre_WT_17-all.csv")
res <- read.csv("pre_deltaIJ_24_vs_pre_WT_24-all.csv")
res <- read.csv("0_5_deltaIJ_17_vs_0_5_WT_17-all.csv")
res <- read.csv("0_5_deltaIJ_24_vs_0_5_WT_24-all.csv")
# Replace empty GeneName with modified GeneID
res$GeneName <- ifelse(
  res$GeneName == "" | is.na(res$GeneName),
  gsub("gene-", "", res$GeneID),
  res$GeneName
)
duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]

res <- res %>%
  group_by(GeneName) %>%
  slice_min(padj, with_ties = FALSE) %>%
  ungroup()
res <- as.data.frame(res)
# Sort res first by padj (ascending) and then by log2FoldChange (descending)
res <- res[order(res$padj, -res$log2FoldChange), ]

up_regulated <- res[res$log2FoldChange >= 2 & res$padj <= 5e-2, ]
down_regulated <- res[res$log2FoldChange <= -2 & res$padj <= 5e-2, ]
wb <- createWorkbook()
addWorksheet(wb, "Complete_Data")
writeData(wb, "Complete_Data", res)
addWorksheet(wb, "Up_Regulated")
writeData(wb, "Up_Regulated", up_regulated)
addWorksheet(wb, "Down_Regulated")
writeData(wb, "Down_Regulated", down_regulated)
# Save the workbook to a file
saveWorkbook(wb, "Gene_Expression_ΔIJ-17_vs_WT-17.xlsx", overwrite = TRUE)
saveWorkbook(wb, "Gene_Expression_ΔIJ-24_vs_WT-24.xlsx", overwrite = TRUE)
saveWorkbook(wb, "Gene_Expression_preΔIJ-17_vs_preWT-17.xlsx", overwrite = TRUE)
saveWorkbook(wb, "Gene_Expression_preΔIJ-24_vs_preWT-24.xlsx", overwrite = TRUE)
saveWorkbook(wb, "Gene_Expression_0_5ΔIJ-17_vs_0_5WT-17.xlsx", overwrite = TRUE)
saveWorkbook(wb, "Gene_Expression_0_5ΔIJ-24_vs_0_5WT-24.xlsx", overwrite = TRUE)

# Set the 'GeneName' column as row.names
rownames(res) <- res$GeneName
# Drop the 'GeneName' column since it's now the row names
res$GeneName <- NULL
head(res)

## Ensure the data frame matches the expected format
## For example, it should have columns: log2FoldChange, padj, etc.
#res <- as.data.frame(res)
## Remove rows with NA in log2FoldChange (if needed)
#res <- res[!is.na(res$log2FoldChange),]

# Replace padj = 0 with a small value
#res$padj[res$padj == 0] <- 1e-305

#library(EnhancedVolcano)
# Assuming res is already sorted and processed
png("ΔIJ-17_vs_WT-17.png", width=1000, height=1200)
png("ΔIJ-24_vs_WT-24.png", width=1000, height=1200)
png("preΔIJ-17_vs_preWT-17.png", width=1000, height=1200)
png("preΔIJ-24_vs_preWT-24.png", width=1000, height=1200)
png("0_5ΔIJ-17_vs_0_5WT-17.png", width=1000, height=1200)
png("0_5ΔIJ-24_vs_0_5WT-24.png", width=1000, height=1200)
#max.overlaps = 10
EnhancedVolcano(res,
                lab = rownames(res),
                x = 'log2FoldChange',
                y = 'padj',
                pCutoff = 5e-2,
                FCcutoff = 2,
                title = '',
                subtitleLabSize = 18,
                pointSize = 3.0,
                labSize = 5.0,
                colAlpha = 1,
                legendIconSize = 4.0,
                drawConnectors = TRUE,
                widthConnectors = 0.5,
                colConnectors = 'black',
                subtitle = expression("0_5ΔIJ-24 versus 0_5WT-24"))
dev.off()

Venn diagram

    #To visualize gene expression overlaps across your conditions, Venn diagrams are useful — but only when comparing 2–5 groups at a time. Given your conditions and comparisons, here’s the best strategy:

    #✅ Best Venn Diagram Setup Options
    #You’re comparing wild-type (WT) and ΔIJ mutant strains under different conditions (no treatment, treatment A, treatment B) and time points (17h, 24h). To avoid overcrowded or unreadable plots, group comparisons by specific contrasts:

    #Option 1: Treatment Effect at One Time Point (ΔIJ vs WT)
    #Compare ΔIJ vs WT at 17h or 24h, under all 3 treatments (None, A, B):

    # Venn: “Treatment-dependent differences (ΔIJ vs WT) at 17h”
    #* ΔIJ vs WT (no treatment) – ΔIJ-17 vs WT-17
    #* ΔIJ vs WT (treatment A) – preΔIJ-17 vs preWT-17
    #* ΔIJ vs WT (treatment B) – 0_5ΔIJ-17 vs WT0_5-17

    #👉 3-way Venn diagram: Shows overlap in DEGs between different treatment conditions for the ΔIJ effect at a single time point.

    # Install and load required packages
    if (!require("VennDiagram")) install.packages("VennDiagram")
    if (!require("openxlsx")) install.packages("openxlsx")
    library(VennDiagram)
    library(openxlsx)

    # Set working directory
    setwd("/mnt/md1/DATA/Data_Tam_RNAseq_2025_WT_deltaIJ_ATCC19606/results/star_salmon/degenes")

    # Read upregulated gene lists at 17h
    df_no_treatment <- read.csv("deltaIJ_17_vs_WT_17-up.txt", header = TRUE)
    genes_no_treatment <- df_no_treatment[[1]]

    df_treatA <- read.csv("pre_deltaIJ_17_vs_pre_WT_17-up.txt", header = TRUE)
    genes_treatA <- df_treatA[[1]]

    df_treatB <- read.csv("0_5_deltaIJ_17_vs_0_5_WT_17-up.txt", header = TRUE)
    genes_treatB <- df_treatB[[1]]

    # Clean gene names (optional, in case of extra characters like quotes)
    genes_no_treatment <- gsub("^\"|\"$", "", genes_no_treatment)
    genes_treatA <- gsub("^\"|\"$", "", genes_treatA)
    genes_treatB <- gsub("^\"|\"$", "", genes_treatB)

    # Create a list for Venn
    venn_list <- list(
      "No_Treatment" = genes_no_treatment,
      "Treatment_A" = genes_treatA,
      "Treatment_B" = genes_treatB
    )

    # Save Venn diagram
    venn.diagram(
      x = venn_list,
      filename = "venn_17h_upregulated_treatments.png",
      imagetype = "png",
      output = TRUE,
      col = "transparent",
      fill = c("#66c2a5", "#fc8d62", "#8da0cb"),
      alpha = 0.5,
      cex = 1.5,
      cat.cex = 1.4,
      cat.pos = 0,
      cat.dist = 0.05,
      main = "Upregulated Genes (ΔIJ vs WT, 17h)",
      main.cex = 1.5
    )

    # Intersections
    only_no <- setdiff(genes_no_treatment, union(genes_treatA, genes_treatB))
    only_A <- setdiff(genes_treatA, union(genes_no_treatment, genes_treatB))
    only_B <- setdiff(genes_treatB, union(genes_no_treatment, genes_treatA))

    no_A <- intersect(genes_no_treatment, genes_treatA)
    no_B <- intersect(genes_no_treatment, genes_treatB)
    A_B <- intersect(genes_treatA, genes_treatB)

    no_A_B <- Reduce(intersect, list(genes_no_treatment, genes_treatA, genes_treatB))

    # Remove overlapping from pairwise (keep only those not in 3-way)
    no_A <- setdiff(no_A, no_A_B)
    no_B <- setdiff(no_B, no_A_B)
    A_B <- setdiff(A_B, no_A_B)

    # Write to Excel
    wb <- createWorkbook()
    addWorksheet(wb, "Only_No_Treatment")
    addWorksheet(wb, "Only_Treatment_A")
    addWorksheet(wb, "Only_Treatment_B")
    addWorksheet(wb, "No_Treatment_AND_Treatment_A")
    addWorksheet(wb, "No_Treatment_AND_Treatment_B")
    addWorksheet(wb, "Treatment_A_AND_Treatment_B")
    addWorksheet(wb, "All_Three")

    writeData(wb, "Only_No_Treatment", only_no)
    writeData(wb, "Only_Treatment_A", only_A)
    writeData(wb, "Only_Treatment_B", only_B)
    writeData(wb, "No_Treatment_AND_Treatment_A", no_A)
    writeData(wb, "No_Treatment_AND_Treatment_B", no_B)
    writeData(wb, "Treatment_A_AND_Treatment_B", A_B)
    writeData(wb, "All_Three", no_A_B)

    saveWorkbook(wb, "upregulated_17h_intersections.xlsx", overwrite = TRUE)

    #--

    if (!require("VennDiagram")) install.packages("VennDiagram")
    if (!require("openxlsx")) install.packages("openxlsx")
    library(VennDiagram)
    library(openxlsx)

    setwd("/mnt/md1/DATA/Data_Tam_RNAseq_2025_WT_deltaIJ_ATCC19606/results/star_salmon/degenes")

    # Helper function
    process_and_save_venn <- function(label, files, outfile_prefix) {
      gene_lists <- list()

      # Read gene lists
      for (name in names(files)) {
        df <- read.csv(files[[name]], header = TRUE)
        genes <- gsub("^\"|\"$", "", df[[1]])
        gene_lists[[name]] <- genes
      }

      # Plot Venn
      venn.diagram(
        x = gene_lists,
        filename = paste0(outfile_prefix, ".png"),
        imagetype = "png",
        output = TRUE,
        col = "transparent",
        fill = c("#66c2a5", "#fc8d62", "#8da0cb"),
        alpha = 0.5,
        cex = 1.5,
        cat.cex = 1.4,
        #cat.pos = 0,
        #cat.dist = 0.05,
        cat.pos = c(-30, 30, 135),   # Move labels around the circles
        cat.dist = c(0.04, 0.04, 0.04),  # Push labels further outside
        main = label,
        main.cex = 1.5
      )

      # Intersections
      A <- gene_lists[[1]]
      B <- gene_lists[[2]]
      C <- gene_lists[[3]]

      only_A <- setdiff(A, union(B, C))
      only_B <- setdiff(B, union(A, C))
      only_C <- setdiff(C, union(A, B))

      AB <- setdiff(intersect(A, B), intersect(intersect(A, B), C))
      AC <- setdiff(intersect(A, C), intersect(intersect(A, C), B))
      BC <- setdiff(intersect(B, C), intersect(intersect(A, B), C))

      ABC <- Reduce(intersect, list(A, B, C))

      # Save Excel
      wb <- createWorkbook()
      addWorksheet(wb, "Only_A")
      addWorksheet(wb, "Only_B")
      addWorksheet(wb, "Only_C")
      addWorksheet(wb, "A_and_B")
      addWorksheet(wb, "A_and_C")
      addWorksheet(wb, "B_and_C")
      addWorksheet(wb, "All_Three")

      writeData(wb, "Only_A", only_A)
      writeData(wb, "Only_B", only_B)
      writeData(wb, "Only_C", only_C)
      writeData(wb, "A_and_B", AB)
      writeData(wb, "A_and_C", AC)
      writeData(wb, "B_and_C", BC)
      writeData(wb, "All_Three", ABC)

      saveWorkbook(wb, paste0(outfile_prefix, ".xlsx"), overwrite = TRUE)
    }

    ### === UPREGULATED GENES === ###
    process_and_save_venn(
      label = "Upregulated Genes (ΔIJ vs WT, 17h)",
      files = list(
        "No_Treatment" = "deltaIJ_17_vs_WT_17-up.txt",
        "Treatment_A" = "pre_deltaIJ_17_vs_pre_WT_17-up.txt",
        "Treatment_B" = "0_5_deltaIJ_17_vs_0_5_WT_17-up.txt"
      ),
      outfile_prefix = "venn_upregulated_17h"
    )

    process_and_save_venn(
      label = "Upregulated Genes (ΔIJ vs WT, 24h)",
      files = list(
        "No_Treatment" = "deltaIJ_24_vs_WT_24-up.txt",
        "Treatment_A" = "pre_deltaIJ_24_vs_pre_WT_24-up.txt",
        "Treatment_B" = "0_5_deltaIJ_24_vs_0_5_WT_24-up.txt"
      ),
      outfile_prefix = "venn_upregulated_24h"
    )

    ### === DOWNREGULATED GENES === ###
    process_and_save_venn(
      label = "Downregulated Genes (ΔIJ vs WT, 17h)",
      files = list(
        "No_Treatment" = "deltaIJ_17_vs_WT_17-down.txt",
        "Treatment_A" = "pre_deltaIJ_17_vs_pre_WT_17-down.txt",
        "Treatment_B" = "0_5_deltaIJ_17_vs_0_5_WT_17-down.txt"
      ),
      outfile_prefix = "venn_downregulated_17h"
    )

    process_and_save_venn(
      label = "Downregulated Genes (ΔIJ vs WT, 24h)",
      files = list(
        "No_Treatment" = "deltaIJ_24_vs_WT_24-down.txt",
        "Treatment_A" = "pre_deltaIJ_24_vs_pre_WT_24-down.txt",
        "Treatment_B" = "0_5_deltaIJ_24_vs_0_5_WT_24-down.txt"
      ),
      outfile_prefix = "venn_downregulated_24h"
    )

KEGG and GO annotations in non-model organisms

https://www.biobam.com/functional-analysis/

Assign KEGG and GO Terms (see diagram above)

Since your organism is non-model, standard R databases (org.Hs.eg.db, etc.) won’t work. You’ll need to manually retrieve KEGG and GO annotations.
- Preparing file 1 eggnog_out.emapper.annotations.txt for the R-code below: (KEGG Terms): EggNog based on orthology and phylogenies
  
  EggNOG-mapper assigns both KEGG Orthology (KO) IDs and GO terms.
  
  Install EggNOG-mapper:
```
mamba create -n eggnog_env python=3.8 eggnog-mapper -c conda-forge -c bioconda  #eggnog-mapper_2.1.12
mamba activate eggnog_env
```
  Run annotation:
```
#diamond makedb --in eggnog6.prots.faa -d eggnog_proteins.dmnd
mkdir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
download_eggnog_data.py --dbname eggnog.db -y --data_dir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
#NOT_WORKING: emapper.py -i CP059040_gene.fasta -o eggnog_dmnd_out --cpu 60 -m diamond[hmmer,mmseqs] --dmnd_db /home/jhuang/REFs/eggnog_data/data/eggnog_proteins.dmnd
python ~/Scripts/update_fasta_header.py CP059040_protein_.fasta CP059040_protein.fasta
emapper.py -i CP059040_protein.fasta -o eggnog_out --cpu 60 --resume
#----> result annotations.tsv: Contains KEGG, GO, and other functional annotations.
#---->  470.IX87_14445:
    * 470 likely refers to the organism or strain (e.g., Acinetobacter baumannii ATCC 19606 or another related strain).
    * IX87_14445 would refer to a specific gene or protein within that genome.
```
  Extract KEGG KO IDs from annotations.emapper.annotations.
- Preparing file 2 blast2goannot.annot2 for the R-code below:
  - Basic (GO Terms from ‘Blast2GO 5 Basic’, saved in blast2go_annot.annot): Using Blast/Diamond + Blast2GO_GUI based on sequence alignment + GO mapping
  - ‘Load protein sequences’ (Tags: NONE, generated columns: Nr, SeqName) –>
  - Buttons ‘blast’ (Tags: BLASTED, generated columns: Description, Length, #Hits, e-Value, sim mean),
  - Button ‘mapping’ (Tags: MAPPED, generated columns: #GO, GO IDs, GO Names), “Mapping finished – Please proceed now to annotation.”
  - Button ‘annot’ (Tags: ANNOTATED, generated columns: Enzyme Codes, Enzyme Names), “Annotation finished.”
    - Used parameter ‘Annotation CutOff’: The Blast2GO Annotation Rule seeks to find the most specific GO annotations with a certain level of reliability. An annotation score is calculated for each candidate GO which is composed by the sequence similarity of the Blast Hit, the evidence code of the source GO and the position of the particular GO in the Gene Ontology hierarchy. This annotation score cutoff select the most specific GO term for a given GO branch which lies above this value.
    - Used parameter ‘GO Weight’ is a value which is added to Annotation Score of a more general/abstract Gene Ontology term for each of its more specific, original source GO terms. In this case, more general GO terms which summarise many original source terms (those ones directly associated to the Blast Hits) will have a higher Annotation Score.
  - Advanced (GO Terms from ‘Blast2GO 5 Basic’): Interpro based protein families / domains –> Button interpro
  - Button ‘interpro’ (Tags: INTERPRO, generated columns: InterPro IDs, InterPro GO IDs, InterPro GO Names) –> “InterProScan Finished – You can now merge the obtained GO Annotations.”
  - MERGE the results of InterPro GO IDs (advanced) to GO IDs (basic) and generate final GO IDs, saved in blast2go_annot.annot2
  - Button ‘interpro’/’Merge InterProScan GOs to Annotation’ –> “Merge (add and validate) all GO terms retrieved via InterProScan to the already existing GO annotation.” –> “Finished merging GO terms from InterPro with annotations. Maybe you want to run ANNEX (Annotation Augmentation).”
  - (NOT_USED) Button ‘annot’/’ANNEX’ –> “ANNEX finished. Maybe you want to do the next step: Enzyme Code Mapping.”
  - PREPARING go_terms and ecterms: annot* file:
  cut -f1-2 -d$’\t’ blast2go_annot.annot2 > blast2goannot.annot2

Perform KEGG and GO Enrichment in R

    #BiocManager::install("GO.db")
    #BiocManager::install("AnnotationDbi")

    # Load required libraries
    library(openxlsx)  # For Excel file handling
    library(dplyr)     # For data manipulation
    library(tidyr)
    library(stringr)
    library(clusterProfiler)  # For KEGG and GO enrichment analysis
    #library(org.Hs.eg.db)  # Replace with appropriate organism database
    library(GO.db)
    library(AnnotationDbi)

    setwd("~/DATA/Data_Tam_RNAseq_2025_WT_deltaIJ_ATCC19606//results/star_salmon/degenes")
    # Step 1: Load the blast2go annotation file with a check for missing columns
    annot_df <- read.table("/home/jhuang/b2gWorkspace_Tam_RNAseq_2024/blast2go_annot.annot2_",
                        header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE)

    # If the structure is inconsistent, we can make sure there are exactly 3 columns:
    colnames(annot_df) <- c("GeneID", "Term")
    # Step 2: Filter and aggregate GO and EC terms as before
    go_terms <- annot_df %>%
    filter(grepl("^GO:", Term)) %>%
    group_by(GeneID) %>%
    summarize(GOs = paste(Term, collapse = ","), .groups = "drop")
    ec_terms <- annot_df %>%
    filter(grepl("^EC:", Term)) %>%
    group_by(GeneID) %>%
    summarize(EC = paste(Term, collapse = ","), .groups = "drop")

    # Load the results
    #res <- read.csv("deltaIJ_17_vs_WT_17-all.csv")   #up11, down3
    #res <- read.csv("deltaIJ_24_vs_WT_24-all.csv")   #up0, down2
    #res <- read.csv("pre_deltaIJ_17_vs_pre_WT_17-all.csv")  #up238, down90
    #res <- read.csv("pre_deltaIJ_24_vs_pre_WT_24-all.csv")  #up83, down64
    #res <- read.csv("0_5_deltaIJ_17_vs_0_5_WT_17-all.csv")  #up74, down14
    res <- read.csv("0_5_deltaIJ_24_vs_0_5_WT_24-all.csv")  #up1, down3

    # Replace empty GeneName with modified GeneID
    res$GeneName <- ifelse(
        res$GeneName == "" | is.na(res$GeneName),
        gsub("gene-", "", res$GeneID),
        res$GeneName
    )

    # Remove duplicated genes by selecting the gene with the smallest padj
    duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]

    res <- res %>%
    group_by(GeneName) %>%
    slice_min(padj, with_ties = FALSE) %>%
    ungroup()

    res <- as.data.frame(res)
    # Sort res first by padj (ascending) and then by log2FoldChange (descending)
    res <- res[order(res$padj, -res$log2FoldChange), ]
    # Read eggnog annotations
    eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2024_AUM_MHB_Urine_ATCC19606/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
    # Remove the "gene-" prefix from GeneID in res to match eggnog 'query' format
    res$GeneID <- gsub("gene-", "", res$GeneID)
    # Merge eggnog data with res based on GeneID
    res <- res %>%
    left_join(eggnog_data, by = c("GeneID" = "query"))

    # DEBUG: NOT_NECESSARY, since res has already GeneName
    ##Convert row names to a new column 'GeneName' in res
    #res_with_geneName <- res %>%
    #mutate(GeneName = rownames(res)) %>%
    #as.data.frame()  # Ensure that it's a regular data frame without row names
    ## View the result
    #head(res_with_geneName)

    # Merge with the res dataframe
    # Perform the left joins and rename columns
    res_updated <- res %>%
    left_join(go_terms, by = "GeneID") %>%
    left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)

    # DEBUG: NOT_NECESSARY, since 'GeneName' is already the first column.
    ## Reorder columns to move 'GeneName' as the first column in res_updated
    #res_updated <- res_updated %>%
    #select(GeneName, everything())

    ## Count the number of rows in the KEGG_ko, GOs, EC columns that have non-missing values
    #num_non_missing_KEGG_ko <- sum(res_updated$KEGG_ko != "-" & !is.na(res_updated$KEGG_ko))
    #print(num_non_missing_KEGG_ko)
    ##[1] 2030
    #num_non_missing_GOs <- sum(res_updated$GOs != "-" & !is.na(res_updated$GOs))
    #print(num_non_missing_GOs)
    ##[1] 2865 --> 2875
    #num_non_missing_EC <- sum(res_updated$EC != "-" & !is.na(res_updated$EC))
    #print(num_non_missing_EC)
    ##[1] 1701

    # Filter up-regulated genes
    up_regulated <- res_updated[res_updated$log2FoldChange > 2 & res_updated$padj < 0.05, ]
    # Filter down-regulated genes
    down_regulated <- res_updated[res_updated$log2FoldChange < -2 & res_updated$padj < 0.05, ]

    # Create a new workbook
    wb <- createWorkbook()
    # Add the complete dataset as the first sheet (with annotations)
    addWorksheet(wb, "Complete_Data")
    writeData(wb, "Complete_Data", res_updated)
    # Add the up-regulated genes as the second sheet (with annotations)
    addWorksheet(wb, "Up_Regulated")
    writeData(wb, "Up_Regulated", up_regulated)
    # Add the down-regulated genes as the third sheet (with annotations)
    addWorksheet(wb, "Down_Regulated")
    writeData(wb, "Down_Regulated", down_regulated)
    # Save the workbook to a file
    saveWorkbook(wb, "Gene_Expression_with_Annotations_0_5ΔIJ-24_vs_0_5WT-24.xlsx", overwrite = TRUE)

    # Set GeneName as row names after the join
    rownames(res_updated) <- res_updated$GeneName
    res_updated <- res_updated %>% dplyr::select(-GeneName)
    ## Set the 'GeneName' column as row.names
    #rownames(res_updated) <- res_updated$GeneName
    ## Drop the 'GeneName' column since it's now the row names
    #res_updated$GeneName <- NULL

    # ---- Perform KEGG enrichment analysis (up_regulated) ----
    gene_list_kegg_up <- up_regulated$KEGG_ko
    gene_list_kegg_up <- gsub("ko:", "", gene_list_kegg_up)
    kegg_enrichment_up <- enrichKEGG(gene = gene_list_kegg_up, organism = 'ko')
    # -- convert the GeneID (Kxxxxxx) to the true GeneID --
    # Step 0: Create KEGG to GeneID mapping
    kegg_to_geneid_up <- up_regulated %>%
    dplyr::select(KEGG_ko, GeneID) %>%
    filter(!is.na(KEGG_ko)) %>%  # Remove missing KEGG KO entries
    mutate(KEGG_ko = str_remove(KEGG_ko, "ko:"))  # Remove 'ko:' prefix if present
    # Step 1: Clean KEGG_ko values (separate multiple KEGG IDs)
    kegg_to_geneid_clean <- kegg_to_geneid_up %>%
    mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>%  # Remove 'ko:' prefixes
    separate_rows(KEGG_ko, sep = ",") %>%  # Ensure each KEGG ID is on its own row
    filter(KEGG_ko != "-") %>%  # Remove invalid KEGG IDs ("-")
    distinct()  # Remove any duplicate mappings
    # Step 2.1: Expand geneID column in kegg_enrichment_up
    expanded_kegg <- kegg_enrichment_up %>%
    as.data.frame() %>%
    separate_rows(geneID, sep = "/") %>%  # Split multiple KEGG IDs (Kxxxxx)
    left_join(kegg_to_geneid_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>%  # Explicitly handle many-to-many
    distinct() %>%  # Remove duplicate matches
    group_by(ID) %>%
    summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop")  # Re-collapse results
    #dplyr::glimpse(expanded_kegg)
    # Step 3.1: Replace geneID column in the original dataframe
    kegg_enrichment_up_df <- as.data.frame(kegg_enrichment_up)
    # Remove old geneID column and merge new one
    kegg_enrichment_up_df <- kegg_enrichment_up_df %>%
    dplyr::select(-geneID) %>%  # Remove old geneID column
    left_join(expanded_kegg %>% dplyr::select(ID, GeneID), by = "ID") %>%  # Merge new GeneID column
    dplyr::rename(geneID = GeneID)  # Rename column back to geneID

    # ---- Perform KEGG enrichment analysis (down_regulated) ----
    # Step 1: Extract KEGG KO terms from down-regulated genes
    gene_list_kegg_down <- down_regulated$KEGG_ko
    gene_list_kegg_down <- gsub("ko:", "", gene_list_kegg_down)
    # Step 2: Perform KEGG enrichment analysis
    kegg_enrichment_down <- enrichKEGG(gene = gene_list_kegg_down, organism = 'ko')
    # --- Convert KEGG gene IDs (Kxxxxxx) to actual GeneIDs ---
    # Step 3: Create KEGG to GeneID mapping from down_regulated dataset
    kegg_to_geneid_down <- down_regulated %>%
    dplyr::select(KEGG_ko, GeneID) %>%
    filter(!is.na(KEGG_ko)) %>%  # Remove missing KEGG KO entries
    mutate(KEGG_ko = str_remove(KEGG_ko, "ko:"))  # Remove 'ko:' prefix if present
    # Step 4: Clean KEGG_ko values (handle multiple KEGG IDs)
    kegg_to_geneid_down_clean <- kegg_to_geneid_down %>%
    mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>%  # Remove 'ko:' prefixes
    separate_rows(KEGG_ko, sep = ",") %>%  # Ensure each KEGG ID is on its own row
    filter(KEGG_ko != "-") %>%  # Remove invalid KEGG IDs ("-")
    distinct()  # Remove duplicate mappings
    # Step 5: Expand geneID column in kegg_enrichment_down
    expanded_kegg_down <- kegg_enrichment_down %>%
    as.data.frame() %>%
    separate_rows(geneID, sep = "/") %>%  # Split multiple KEGG IDs (Kxxxxx)
    left_join(kegg_to_geneid_down_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>%  # Handle many-to-many mappings
    distinct() %>%  # Remove duplicate matches
    group_by(ID) %>%
    summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop")  # Re-collapse results
    # Step 6: Replace geneID column in the original kegg_enrichment_down dataframe
    kegg_enrichment_down_df <- as.data.frame(kegg_enrichment_down) %>%
    dplyr::select(-geneID) %>%  # Remove old geneID column
    left_join(expanded_kegg_down %>% dplyr::select(ID, GeneID), by = "ID") %>%  # Merge new GeneID column
    dplyr::rename(geneID = GeneID)  # Rename column back to geneID
    # View the updated dataframe
    head(kegg_enrichment_down_df)

    # Create a new workbook
    wb <- createWorkbook()
    # Save enrichment results to the workbook
    addWorksheet(wb, "KEGG_Enrichment_Up")
    writeData(wb, "KEGG_Enrichment_Up", as.data.frame(kegg_enrichment_up_df))
    # Save enrichment results to the workbook
    addWorksheet(wb, "KEGG_Enrichment_Down")
    writeData(wb, "KEGG_Enrichment_Down", as.data.frame(kegg_enrichment_down_df))
    #saveWorkbook(wb, "KEGG_Enrichment.xlsx", overwrite = TRUE)

    # ---- Perform GO enrichment analysis (TODO: extract the merged GO IDs from 'Blast2GO 5 Basic' and adapt the code below!)----

    # Define gene list (up-regulated genes)
    gene_list_go_up <- up_regulated$GeneID  # Extract the 149 up-regulated genes
    gene_list_go_down <- down_regulated$GeneID  # Extract the 65 down-regulated genes

    # Define background gene set (all genes in res)
    background_genes <- res_updated$GeneID  # Extract the 3646 background genes

    # Prepare GO annotation data from res
    go_annotation <- res_updated[, c("GOs","GeneID")]  # Extract relevant columns
    go_annotation <- go_annotation %>%
    tidyr::separate_rows(GOs, sep = ",")  # Split multiple GO terms into separate rows

    # Perform GO enrichment analysis, where pAdjustMethod is one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
    go_enrichment_up <- enricher(
        gene = gene_list_go_up,                # Up-regulated genes
        TERM2GENE = go_annotation,       # Custom GO annotation
        pvalueCutoff = 1.0,             # Significance threshold
        pAdjustMethod = "BH",
        universe = background_genes      # Define the background gene set
    )
    go_enrichment_up <- as.data.frame(go_enrichment_up)

    go_enrichment_down <- enricher(
        gene = gene_list_go_down,                # Up-regulated genes
        TERM2GENE = go_annotation,       # Custom GO annotation
        pvalueCutoff = 1.0,             # Significance threshold
        pAdjustMethod = "BH",
        universe = background_genes      # Define the background gene set
    )
    go_enrichment_down <- as.data.frame(go_enrichment_down)

    ## Remove the 'p.adjust' column since no adjusted methods have been applied!
    #go_enrichment_up <- go_enrichment_up[, !names(go_enrichment_up) %in% "p.adjust"]
    # Update the Description column with the term descriptions
    go_enrichment_up$Description <- sapply(go_enrichment_up$ID, function(go_id) {
    # Using select to get the term description
    term <- tryCatch({
        AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
    }, error = function(e) {
        message(paste("Error for GO term:", go_id))  # Print which GO ID caused the error
        return(data.frame(TERM = NA))  # In case of error, return NA
    })

    if (nrow(term) > 0) {
        return(term$TERM)
    } else {
        return(NA)  # If no description found, return NA
    }
    })
    ## Print the updated data frame
    #print(go_enrichment_up)

    ## Remove the 'p.adjust' column since no adjusted methods have been applied!
    #go_enrichment_down <- go_enrichment_down[, !names(go_enrichment_down) %in% "p.adjust"]
    # Update the Description column with the term descriptions
    go_enrichment_down$Description <- sapply(go_enrichment_down$ID, function(go_id) {
    # Using select to get the term description
    term <- tryCatch({
        AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
    }, error = function(e) {
        message(paste("Error for GO term:", go_id))  # Print which GO ID caused the error
        return(data.frame(TERM = NA))  # In case of error, return NA
    })

    if (nrow(term) > 0) {
        return(term$TERM)
    } else {
        return(NA)  # If no description found, return NA
    }
    })

    addWorksheet(wb, "GO_Enrichment_Up")
    writeData(wb, "GO_Enrichment_Up", as.data.frame(go_enrichment_up))

    addWorksheet(wb, "GO_Enrichment_Down")
    writeData(wb, "GO_Enrichment_Down", as.data.frame(go_enrichment_down))

    # Save the workbook with enrichment results
    saveWorkbook(wb, "KEGG_and_GO_Enrichments_0_5ΔIJ-24_vs_0_5WT-24.xlsx", overwrite = TRUE)

    #Error for GO term: GO:0006807: replace GO:0006807  obsolete nitrogen compound metabolic process
    #TODO: marked the color as yellow if the p.adjusted <= 0.05 in GO_enrichment!

Finalizing the KEGG and GO Enrichment table

    1. NOTE: geneIDs in KEGG_Enrichment have been already translated from ko to geneID in H0N29_*-format;
    2. NEED_MANUAL_DELETION: p.adjust values have been calculated, we have to filter all records in GO_Enrichment-results by |p.adjust|<=0.05.
    3. DELETE_ALL_q-values, since sometimes the qvalues missing!: If only one record --> no q-values: the missing qvalue is expected here — you can't calculate q-values with only one p-value. The p.adjust (e.g. Benjamini-Hochberg FDR) is still valid because it technically works even with a single p-value, but qvalue requires more data.

MicrobiotaProcess_PCA_Group3-4.R processing Data_Karoline_16S_2025

Leave a reply

# https://bioconductor.org/packages/release/bioc/vignettes/MicrobiotaProcess/inst/doc//MicrobiotaProcess.html

# -----------------------------------
# ---- prepare the R environment ----
#Rscript MicrobiotaProcess.R
#NOTE: exit R script, then login again R-environment; rm -rf Phyloseq*_cache
#mkdir figures
#rmarkdown::render('Phyloseq.Rmd',output_file='Phyloseq.html')
#source("MicrobiotaProcess_Group3_vs_Group4.R")

# with #alpha = 2.0, running the following script further!

# -----------------------------
# ---- 3.1. bridges other tools
##https://github.com/YuLab-SMU/MicrobiotaProcess
##https://www.bioconductor.org/packages/release/bioc/vignettes/MicrobiotaProcess/inst/doc/MicrobiotaProcess.html
##https://chiliubio.github.io/microeco_tutorial/intro.html#framework
##https://yiluheihei.github.io/microbiomeMarker/reference/plot_cladogram.html
#BiocManager::install("MicrobiotaProcess")
#install.packages("microeco")
#install.packages("ggalluvial")
#install.packages("ggh4x")

library(MicrobiotaProcess)
library(microeco)
library(ggalluvial)
library(ggh4x)
library(gghalves)

## Convert the phyloseq object to a MicrobiotaProcess object
#mp <- as.MicrobiotaProcess(ps.ng.tax)

#mt <- phyloseq2microeco(ps.ng.tax) #--> ERROR
#abundance_table <- mt$abun_table
#taxonomy_table <- mt$tax_table

#ps.ng.tax_abund <- phyloseq::filter_taxa(ps.ng.tax, function(x) sum(x > total*0.01) > 0, TRUE)
#ps.ng.tax_most = phyloseq::filter_taxa(ps.ng.tax_rel, function(x) mean(x) > 0.001, TRUE)

##OPTION1 (NOT_USED): take all samples, prepare ps.ng.tax_abund --> mpse_abund
##mpse <- ps.ng.tax %>% as.MPSE()
#mpse_abund <- ps.ng.tax_abund %>% as.MPSE()

##OPTION2 (USED!): take partial samples, prepare ps.ng.tax or ps.ng.tax_abund (2 replacements!)--> ps.ng.tax_sel --> mpse_abund
ps.ng.tax_sel <- ps.ng.tax_abund

##otu_table(ps.ng.tax_sel) <- otu_table(ps.ng.tax)[,c("1","2","5","6","7",  "15","16","17","18","19","20",  "29","30","31","32",  "40","41","42","43","44","46")]
##NOTE: Only choose Group2, Group4, Group6, Group8
#> ps.ng.tax_sel
#otu_table()   OTU Table:         [ 37465 taxa and 29 samples ]
#sample_data() Sample Data:       [ 29 samples by 10 sample variables ]
#tax_table()   Taxonomy Table:    [ 37465 taxa by 7 taxonomic ranks ]
#phy_tree()    Phylogenetic Tree: [ 37465 tips and 37461 internal nodes ]
#-Group4: "21","22","23","24","25","26","27","28",
#-Group8: ,  "47","48","49","50","52","53","55"
otu_table(ps.ng.tax_sel) <- otu_table(ps.ng.tax_abund)[,c("sample-C3","sample-C4","sample-C5","sample-C6","sample-C7",  "sample-E4","sample-E5","sample-E6","sample-E7","sample-E8")]
mpse_abund <- ps.ng.tax_sel %>% as.MPSE()
# A MPSE-tibble (MPSE object) abstraction: 2,352 × 20
# NOTE mpse_abund contains 20 variables: OTU, Sample, Abundance, BarcodeSequence, LinkerPrimerSequence, FileInput, Group,
#   Sex_age

, pre_post_stroke , Conc , Vol_50ng , Vol_PCR , Description , # Domain , Phylum , Class , Order , Family , Genus , Species # ———————————– # —- 3.2. alpha diversity analysis # Rarefied species richness + RareAbundance mpse_abund %<>% mp_rrarefy() # ‘chunks’ represent the split number of each sample to calculate alpha # diversity, default is 400. e.g. If a sample has total 40000 # reads, if chunks is 400, it will be split to 100 sub-samples # (100, 200, 300,…, 40000), then alpha diversity index was # calculated based on the sub-samples. # ‘.abundance’ the column name of abundance, if the ‘.abundance’ is not be # rarefied calculate rarecurve, user can specific ‘force=TRUE’. mpse_abund %<>% mp_cal_rarecurve( .abundance = RareAbundance, chunks = 400 ) # The RareAbundanceRarecurve column will be added the colData slot # automatically (default action=”add”) #NOTE mpse_abund contains 22 varibles = 20 varibles + RareAbundance + RareAbundanceRarecurve # default will display the confidence interval around smooth. # se=TRUE # NOTE that two colors #c(“#00A087FF”, “#3C5488FF”) for .group = pre_post_stroke; four colors c(“#1f78b4”, “#33a02c”, “#e31a1c”, “#6a3d9a”) for .group = Group; p1 <- mpse_abund %>% mp_plot_rarecurve( .rare = RareAbundanceRarecurve, .alpha = Observe, ) p2 <- mpse_abund %>% mp_plot_rarecurve( .rare = RareAbundanceRarecurve, .alpha = Observe, .group = Group ) + scale_color_manual(values=c(“#1f78b4”, “#e31a1c”)) + scale_fill_manual(values=c(“#1f78b4”, “#e31a1c”), guide=”none”) # combine the samples belong to the same groups if plot.group=TRUE p3 <- mpse_abund %>% mp_plot_rarecurve( .rare = RareAbundanceRarecurve, .alpha = “Observe”, .group = Group, plot.group = TRUE ) + scale_color_manual(values=c(“#1f78b4”, “#e31a1c”)) + scale_fill_manual(values=c(“#1f78b4”, “#e31a1c”),guide=”none”) png(“rarefaction_of_samples_or_groups.png”, width=1080, height=600) p1 + p2 + p3 dev.off() # —————————————— # 3.3. calculate alpha index and visualization library(ggplot2) library(MicrobiotaProcess) mpse_abund %<>% mp_cal_alpha(.abundance=RareAbundance) mpse_abund #NOTE mpse_abund contains 28 varibles = 22 varibles + Observe , Chao1 , ACE , Shannon , Simpson , Pielou f1 <- mpse_abund %>% mp_plot_alpha( .group=Group, .alpha=c(Observe, Chao1, ACE, Shannon, Simpson, Pielou) ) + scale_fill_manual(values=c(“#1f78b4”, “#e31a1c”), guide=”none”) + scale_color_manual(values=c(“#1f78b4”, “#e31a1c”), guide=”none”) f2 <- mpse_abund %>% mp_plot_alpha( .alpha=c(Observe, Chao1, ACE, Shannon, Simpson, Pielou) ) #ps.ng.tax_sel contais only pre samples –> f1 cannot be generated! png(“alpha_diversity_comparison.png”, width=1400, height=600) f1 / f2 dev.off() # ——————————————- # 3.4. The visualization of taxonomy abundance (Class) mpse_abund %<>% mp_cal_abundance( # for each samples .abundance = RareAbundance ) %>% mp_cal_abundance( # for each groups .abundance=RareAbundance, .group=Group ) mpse_abund #NOTE mpse_abund contains 29 varibles = 28 varibles + RelRareAbundanceBySample # visualize the relative abundance of top 20 phyla for each sample. # .group=time, p1 <- mpse_abund %>% mp_plot_abundance( .abundance=RareAbundance, taxa.class = Class, topn = 20, relative = TRUE ) # visualize the abundance (rarefied) of top 20 phyla for each sample. # .group=time, p2 <- mpse_abund %>% mp_plot_abundance( .abundance=RareAbundance, taxa.class = Class, topn = 20, relative = FALSE ) png(“relative_abundance_and_abundance.png”, width= 1200, height=600) #NOT PRODUCED! p1 / p2 dev.off() #—- h1 <- mpse_abund %>% mp_plot_abundance( .abundance = RareAbundance, .group = Group, taxa.class = Class, relative = TRUE, topn = 20, geom = ‘heatmap’, features.dist = ‘euclidean’, features.hclust = ‘average’, sample.dist = ‘bray’, sample.hclust = ‘average’ ) h2 <- mpse_abund %>% mp_plot_abundance( .abundance = RareAbundance, .group = Group, taxa.class = Class, relative = FALSE, topn = 20, geom = ‘heatmap’, features.dist = ‘euclidean’, features.hclust = ‘average’, sample.dist = ‘bray’, sample.hclust = ‘average’ ) # the character (scale or theme) of figure can be adjusted by set_scale_theme # refer to the mp_plot_dist png(“relative_abundance_and_abundance_heatmap.png”, width= 1200, height=600) aplot::plot_list(gglist=list(h1, h2), tag_levels=”A”) dev.off() # visualize the relative abundance of top 20 class for each .group (Group) p3 <- mpse_abund %>% mp_plot_abundance( .abundance=RareAbundance, .group=Group, taxa.class = Class, topn = 20, plot.group = TRUE ) # visualize the abundance of top 20 phyla for each .group (time) p4 <- mpse_abund %>% mp_plot_abundance( .abundance=RareAbundance, .group= Group, taxa.class = Class, topn = 20, relative = FALSE, plot.group = TRUE ) png(“relative_abundance_and_abundance_groups.png”, width= 1000, height=1000) p3 / p4 dev.off() # ————————— # 3.5. Beta diversity analysis # ——————————————— # 3.5.1 The distance between samples or groups # standardization # mp_decostand wraps the decostand of vegan, which provides # many standardization methods for community ecology. # default is hellinger, then the abundance processed will # be stored to the assays slot. mpse_abund %<>% mp_decostand(.abundance=Abundance) mpse_abund #NOTE mpse_abund contains 30 varibles = 29 varibles + hellinger # calculate the distance between the samples. # the distance will be generated a nested tibble and added to the # colData slot. mpse_abund %<>% mp_cal_dist(.abundance=hellinger, distmethod=”bray”) mpse_abund #NOTE mpse_abund contains 31 varibles = 30 varibles + bray # mp_plot_dist provides there methods to visualize the distance between the samples or groups # when .group is not provided, the dot heatmap plot will be return p1 <- mpse_abund %>% mp_plot_dist(.distmethod = bray) png(“distance_between_samples.png”, width= 1000, height=1000) p1 dev.off() # when .group is provided, the dot heatmap plot with group information will be return. p2 <- mpse_abund %>% mp_plot_dist(.distmethod = bray, .group = Group) # The scale or theme of dot heatmap plot can be adjusted using set_scale_theme function. p2 %>% set_scale_theme( x = scale_fill_manual( values=c(“#1f78b4”, “#e31a1c”), #c(“orange”, “deepskyblue”), guide = guide_legend( keywidth = 1, keyheight = 0.5, title.theme = element_text(size=8), label.theme = element_text(size=6) ) ), aes_var = Group # specific the name of variable ) %>% set_scale_theme( x = scale_color_gradient( guide = guide_legend(keywidth = 0.5, keyheight = 0.5) ), aes_var = bray ) %>% set_scale_theme( x = scale_size_continuous( range = c(0.1, 3), guide = guide_legend(keywidth = 0.5, keyheight = 0.5) ), aes_var = bray ) png(“distance_between_samples_with_group_info.png”, width= 1000, height=1000) p2 dev.off() # when .group is provided and group.test is TRUE, the comparison of different groups will be returned # Assuming p3 is a ggplot object after mp_plot_dist call p3 <- mpse_abund %>% mp_plot_dist(.distmethod = bray, .group = Group, group.test = TRUE, textsize = 6) + theme( axis.title.x = element_text(size = 14), # Customize x-axis label face = “bold” axis.title.y = element_text(size = 14), # Customize y-axis label axis.text.x = element_text(size = 14), # Customize x-axis ticks axis.text.y = element_text(size = 14) # Customize y-axis ticks ) # Save the plot with the new theme settings png(“Comparison_of_Bray_Distances_Group3-4.png”, width = 1000, height = 1000) print(p3) # Ensure that p3 is explicitly printed in the device dev.off() # Extract Bray-Curtis Distance Values and save them in a Excel-table. library(dplyr) library(tidyr) library(openxlsx) # Define the sample numbers vector sample_numbers <- c("sample-C3","sample-C4","sample-C5","sample-C6","sample-C7", "sample-E4","sample-E5","sample-E6","sample-E7","sample-E8") # Consolidate the list of tibbles using the actual sample numbers bray_data <- bind_rows( lapply(seq_along(mpse_abund$bray), function(i) { tibble( Sample1 = sample_numbers[i], # Use actual sample number Sample2 = mpse_abund$bray[[i]]$braySampley, BrayDistance = mpse_abund$bray[[i]]$bray ) }), .id = "PairID" ) # Print the data frame to check the output print(bray_data) # Write the data frame to an Excel file write.xlsx(bray_data, file = "Bray_Curtis_Distances.xlsx") #DELETE the column "PairID" in Excel file # ----------------------- # 3.5.2 The PCoA analysis #install.packages("corrr") library(corrr) #install.packages("ggside") library(ggside) mpse_abund %<>% mp_cal_pcoa(.abundance=hellinger, distmethod=”bray”) # The dimensions of ordination analysis will be added the colData slot (default). mpse_abund mpse_abund %>% print(width=380, n=2) #NOTE mpse_abund contains 34 varibles = 31 varibles + `PCo1 (30.16%)` , `PCo2 (15.75%)` , `PCo3 (10.53%)` #BUG why 36 variables in mpse_abund %>% print(width=380, n=1) [RareAbundanceBySample , RareAbundanceByGroup ] #> methods(class=class(mpse_abund)) # [1] [ [[<- [<- # [4] $ $<- arrange # [7] as_tibble as.data.frame as.phyloseq #[10] coerce coerce<- colData<- #[13] distinct filter group_by #[16] left_join mp_adonis mp_aggregate_clade #[19] mp_aggregate mp_anosim mp_balance_clade #[22] mp_cal_abundance mp_cal_alpha mp_cal_cca #[25] mp_cal_clust mp_cal_dca mp_cal_dist #[28] mp_cal_nmds mp_cal_pca mp_cal_pcoa #[31] mp_cal_pd_metric mp_cal_rarecurve mp_cal_rda #[34] mp_cal_upset mp_cal_venn mp_decostand #[37] mp_diff_analysis mp_diff_clade mp_envfit #[40] mp_extract_abundance mp_extract_assays mp_extract_dist #[43] mp_extract_feature mp_extract_internal_attr mp_extract_rarecurve #[46] mp_extract_refseq mp_extract_sample mp_extract_taxonomy #[49] mp_extract_tree mp_filter_taxa mp_mantel #[52] mp_mrpp mp_plot_abundance mp_plot_alpha #[55] mp_plot_diff_boxplot mp_plot_diff_res mp_plot_dist #[58] mp_plot_ord mp_plot_rarecurve mp_plot_upset #[61] mp_plot_venn mp_rrarefy mp_select_as_tip #[64] mp_stat_taxa mutate otutree #[67] otutree<- print pull #[70] refsequence refsequence<- rename #[73] rownames<- select show # [ reached getOption("max.print") -- omitted 6 entries ] #see '?methods' for accessing help and source code # We also can perform adonis or anosim to check whether it is significant to the dissimilarities of groups. mpse_abund %<>% mp_adonis(.abundance=hellinger, .formula=~Group, distmethod=”bray”, permutations=9999, action=”add”) mpse_abund %>% mp_extract_internal_attr(name=adonis) #NOTE mpse_abund contains 34 varibles, no new variable, it has been saved in mpse_abund and can be extracted with “mpse_abund %>% mp_extract_internal_attr(name=’adonis’)” #The result of adonis has been saved to the internal attribute ! #It can be extracted using this-object %>% mp_extract_internal_attr(name=’adonis’) #The object contained internal attribute: PCoA ADONIS #Permutation test for adonis under reduced model #Terms added sequentially (first to last) #Permutation: free #Number of permutations: 9999 # #vegan::adonis2(formula = .formula, data = sampleda, permutations = permutations, method = distmethod) # Df SumOfSqs R2 F Pr(>F) #Group 1 0.23448 0.22659 3.5158 5e-04 *** #Residual 12 0.80032 0.77341 #Total 13 1.03480 1.00000 #— #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # (“1″,”2″,”5″,”6″,”7”, “15”,”16″,”17″,”18″,”19″,”20″, “29”,”30″,”31″,”32″, “40”,”41″,”42″,”43″,”44″,”46″) #div.df2[div.df2 == “Group1”] <- "aged.post" #div.df2[div.df2 == "Group3"] <- "young.post" #div.df2[div.df2 == "Group5"] <- "aged.post" #div.df2[div.df2 == "Group7"] <- "young.post" # ("8","9","10","12","13","14", "21","22","23","24","25","26","27","28", "33","34","35","36","37","38","39","51", "47","48","49","50","52","53","55") #div.df2[div.df2 == "Group2"] <- "aged.pre" #div.df2[div.df2 == "Group4"] <- "young.pre" #div.df2[div.df2 == "Group6"] <- "aged.pre" #div.df2[div.df2 == "Group8"] <- "young.pre" #Group1: f.aged and post #Group2: f.aged and pre #Group3: f.young and post #Group4: f.young and pre #Group5: m.aged and post #Group6: m.aged and pre #Group7: m.young and post #Group8: m.young and pre #[,c("1","2","5","6","7", "8","9","10","12","13","14")] #[,c("15","16","17","18","19","20", "21","22","23","24","25","26","27","28")] #[,c("29","30","31","32", "33","34","35","36","37","38","39","51")] #[,c("40","41","42","43","44","46", "47","48","49","50","52","53","55")] #For the first set: #a6cee3: This is a light blue color, somewhat pastel and soft. #b2df8a: A soft, pale green, similar to a light lime. #fb9a99: A soft pink, slightly peachy or salmon-like. #cab2d6: A pale purple, reminiscent of lavender or a light mauve. #For the second set: #1f78b4: This is a strong, vivid blue, close to cobalt or a medium-dark blue. #33a02c: A medium forest green, vibrant and leafy. #e31a1c: A bright red, very vivid, similar to fire engine red. #6a3d9a: This would be described as a deep purple, akin to a dark lavender or plum. p1 <- mpse_abund %>% mp_plot_ord( .ord = pcoa, .group = Group, .color = Group, .size = 4, # increase point size! .alpha = 1, ellipse = TRUE, show.legend = FALSE ) + scale_fill_manual( values = c(“#1f78b4”, “#e31a1c”), guide = guide_legend( keywidth = 1.6, keyheight = 1.6, label.theme = element_text(size = 16) ) ) + scale_color_manual( values = c(“#1f78b4”, “#e31a1c”), guide = guide_legend( keywidth = 1.6, keyheight = 1.6, label.theme = element_text(size = 16) ) ) + theme( axis.text = element_text(size = 20), axis.title = element_text(size = 22), legend.text = element_text(size = 20), legend.title = element_text(size = 22), plot.title = element_text(size = 24, face = “bold”), plot.subtitle = element_text(size = 20) ) png(“PCoA_Group3-4.png”, width = 1200, height = 1000) p1 dev.off() pdf(“PCoA_Group3-4.pdf”) p1 dev.off() p2 <- mpse_abund %>% mp_plot_ord( .ord = pcoa, .group = Group, .color = Group, .size = Shannon, .alpha = Observe, ellipse = TRUE, show.legend = FALSE ) + scale_fill_manual( values = c(“#1f78b4”, “#e31a1c”), guide = guide_legend( keywidth = 0.6, keyheight = 0.6, label.theme = element_text(size = 16) ) ) + scale_color_manual( values = c(“#1f78b4”, “#e31a1c”), guide = guide_legend( keywidth = 0.6, keyheight = 0.6, label.theme = element_text(size = 16) ) ) + scale_size_continuous( range = c(2, 6), # increase size range! guide = guide_legend( keywidth = 0.6, keyheight = 0.6, label.theme = element_text(size = 16) ) ) + theme( axis.text = element_text(size = 20), axis.title = element_text(size = 22), legend.text = element_text(size = 20), legend.title = element_text(size = 22), plot.title = element_text(size = 24, face = “bold”), plot.subtitle = element_text(size = 20) ) png(“PCoA2_Group3-4.png”, width = 1200, height = 1000) p2 dev.off() pdf(“PCoA2_Group3-4.pdf”) p2 dev.off() # Extract sample names and add ShortLabel to colData colData(mpse_abund)$ShortLabel <- gsub("sample-", "", mpse_abund@colData@rownames) p3 <- mpse_abund %>% mp_plot_ord( .ord = pcoa, .group = Group, .color = Group, .size = Shannon, .alpha = Observe, ellipse = TRUE, show.legend = FALSE ) + geom_text_repel(aes(label = ShortLabel), size = 5, max.overlaps = 100) + scale_fill_manual( values = c(“#1f78b4”, “#e31a1c”), guide = guide_legend( keywidth = 0.6, keyheight = 0.6, label.theme = element_text(size = 16) ) ) + scale_color_manual( values = c(“#1f78b4”, “#e31a1c”), guide = guide_legend( keywidth = 0.6, keyheight = 0.6, label.theme = element_text(size = 16) ) ) + scale_size_continuous( range = c(2, 6), # increase size range! guide = guide_legend( keywidth = 0.6, keyheight = 0.6, label.theme = element_text(size = 16) ) ) + theme( axis.text = element_text(size = 20), axis.title = element_text(size = 22), legend.text = element_text(size = 20), legend.title = element_text(size = 22), plot.title = element_text(size = 24, face = “bold”), plot.subtitle = element_text(size = 20) ) png(“PCoA3_Group3-4.png”, width = 1200, height = 1000) p3 dev.off() pdf(“PCoA3_Group3-4.pdf”) p3 dev.off()

Phyloseq.Rmd processing Data_Karoline_16S_2025

Leave a reply

author: ""
date: '`r format(Sys.time(), "%d %m %Y")`'
header-includes:
   - \usepackage{color, fancyvrb}
output:
  rmdformats::readthedown:
    highlight: kate
    number_sections : yes    
  pdf_document: 
    toc: yes
    toc_depth: 2
    number_sections : yes
---

```{r, echo=FALSE, warning=FALSE}
## Global options
# TODO: reproduce the html with the additional figure/SVN-files for editing.
# IMPORTANT NOTE: needs before "mkdir figures"
#NEEDs to be often close R and start R, then new rendering --> working!
#rmarkdown::render('Phyloseq.Rmd',output_file='Phyloseq.html')
#install.packages("heatmaply")
#install.packages("gplots")
#BiocManager::install("phyloseq")
#library(phyloseq)
#DEBUG a package conflict: using phyloseq::tax_table() or detach(package:MicrobiotaProcess, unload=TRUE)
```

```{r load-packages, include=FALSE}

#install.packages(c("picante", "rmdformats"))
#mamba install -c conda-forge freetype libpng harfbuzz fribidi
#mamba install -c conda-forge r-systemfonts r-svglite r-kableExtra freetype fontconfig harfbuzz fribidi libpng
library(knitr)
library(rmdformats)
library(readxl)
library(dplyr)
library(kableExtra)
library(openxlsx)
library(DESeq2)

options(max.print="75")
knitr::opts_chunk$set(fig.width=8, 
                      fig.height=6, 
                      eval=TRUE, 
                      cache=TRUE,
                      echo=TRUE,
                      prompt=FALSE,
                      tidy=FALSE,
                      comment=NA,
                      message=FALSE,
                      warning=FALSE)
opts_knit$set(width=85)
# Phyloseq R library
#* Phyloseq web site : https://joey711.github.io/phyloseq/index.html
#* See in particular tutorials for
#    - importing data: https://joey711.github.io/phyloseq/import-data.html
#    - heat maps: https://joey711.github.io/phyloseq/plot_heatmap-examples.html
```

# Data  

Import raw data and assign sample key:

```{r, echo=FALSE, warning=FALSE}
#extend qiime2_metadata_for_qza_to_phyloseq.tsv with Diet and Flora
#setwd("~/DATA/Data_Laura_16S_2/core_diversity_e4753")
#map_corrected <- read.csv("qiime2_metadata_for_qza_to_phyloseq.tsv", sep="\t", row.names=1)
#knitr::kable(map_corrected) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

# Prerequisites to be installed

* R : https://pbil.univ-lyon1.fr/CRAN/
* R studio : https://www.rstudio.com/products/rstudio/download/#download

```R
install.packages("dplyr")     # To manipulate dataframes
install.packages("readxl")    # To read Excel files into R
install.packages("ggplot2")   # for high quality graphics
install.packages("heatmaply")
source("https://bioconductor.org/biocLite.R")
biocLite("phyloseq")
```

```{r libraries, echo=TRUE, message=FALSE}
#mamba install -c conda-forge r-ggplot2 r-vegan r-data.table
#BiocManager::install("microbiome")
#install.packages("ggpubr")
#install.packages("heatmaply")
library("readxl") # necessary to import the data from Excel file
library("ggplot2") # graphics
library("picante")
library("microbiome") # data analysis and visualisation
library("phyloseq") # also the basis of data object. Data analysis and visualisation
library("ggpubr") # publication quality figures, based on ggplot2
library("dplyr") # data handling, filter and reformat data frames
library("RColorBrewer") # nice color options
library("heatmaply")
library(vegan)
library(gplots)
```

# Read the data and create phyloseq objects

Three tables are needed

* OTU
* Taxonomy
* Samples

```{r, echo=FALSE, warning=FALSE}

    library(tidyr)

    # For QIIME1
    #ps.ng.tax <- import_biom("./exported_table/feature-table.biom", "./exported-tree/tree.nwk")

    # For QIIME2
    #install.packages("remotes")
    #remotes::install_github("jbisanz/qiime2R")
    #"core_metrics_results/rarefied_table.qza", rarefying performed in the code, therefore import the raw table.
    library(qiime2R)
    ps.ng.tax <- qza_to_phyloseq(
      features =  "dada2_tests2/test_7_f240_r240/table.qza",
      tree = "rooted-tree.qza",
      metadata = "qiime2_metadata_for_qza_to_phyloseq.tsv"
    )
    # or
    #biom convert \
    #      -i ./exported_table/feature-table.biom \
    #      -o ./exported_table/feature-table-v1.biom \
    #      --to-json
    #ps.ng.tax <- import_biom("./exported_table/feature-table-v1.biom", treefilename="./exported-tree/tree.nwk")

    sample <- read.csv("./qiime2_metadata_for_qza_to_phyloseq.tsv", sep="\t", row.names=1)
    SAM = sample_data(sample, errorIfNULL = T)
    #rownames(SAM) <- c("1","2","3","5","6","7","8","9","10","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","46","47","48","49","50","51","52","53","55")

    #> setdiff(rownames(SAM), sample_names(ps.ng.tax))
    #[1] "sample-L9" should be removed since the low reads

    ps.ng.tax <- merge_phyloseq(ps.ng.tax, SAM)
    print(ps.ng.tax)

    taxonomy <- read.delim("exported-taxonomy/taxonomy.tsv", sep="\t", header=TRUE)
    #head(taxonomy)
    # Separate taxonomy string into separate ranks
    taxonomy_df <- taxonomy %>% separate(Taxon, into = c("Domain","Phylum","Class","Order","Family","Genus","Species"), sep = ";", fill = "right", extra = "drop")
    # Use Feature.ID as rownames
    rownames(taxonomy_df) <- taxonomy_df$Feature.ID
    taxonomy_df <- taxonomy_df[, -c(1, ncol(taxonomy_df))]  # Drop Feature.ID and Confidence
    # Create tax_table
    tax_table_final <- phyloseq::tax_table(as.matrix(taxonomy_df))
    # Merge tax_table with existing phyloseq object
    ps.ng.tax <- merge_phyloseq(ps.ng.tax, tax_table_final)
    # Check
    ps.ng.tax

    #colnames(phyloseq::tax_table(ps.ng.tax)) <- c("Domain","Phylum","Class","Order","Family","Genus","Species")
    saveRDS(ps.ng.tax, "./ps.ng.tax.rds")
```

Visualize data
```{r, echo=TRUE, warning=FALSE}
  sample_names(ps.ng.tax)
  rank_names(ps.ng.tax)
  sample_variables(ps.ng.tax)

  # Define sample names once
  samples <- c(
    "sample-A1","sample-A2","sample-A3","sample-A4","sample-A5","sample-A6","sample-A7","sample-A8","sample-A9","sample-A10","sample-A11",
    "sample-B1","sample-B2","sample-B3","sample-B4","sample-B5","sample-B6","sample-B7","sample-B8","sample-B9","sample-B10","sample-B11","sample-B12","sample-B13","sample-B14","sample-B15","sample-B16",
    "sample-C1","sample-C2","sample-C3","sample-C4","sample-C5","sample-C6","sample-C7","sample-C8","sample-C9","sample-C10",
    "sample-E1","sample-E2","sample-E3","sample-E4","sample-E5","sample-E6","sample-E7","sample-E8","sample-E9","sample-E10",
    "sample-F1","sample-F2","sample-F3","sample-F4","sample-F5",
    "sample-G1","sample-G2","sample-G3","sample-G4","sample-G5","sample-G6",
    "sample-H1","sample-H2","sample-H3","sample-H4","sample-H5","sample-H6",
    "sample-I1","sample-I2","sample-I3","sample-I4","sample-I5","sample-I6",
    "sample-J1","sample-J2","sample-J3","sample-J4","sample-J10","sample-J11",  #RESIZED: "sample-J5","sample-J6","sample-J7","sample-J8","sample-J9",
    "sample-K7","sample-K8","sample-K9","sample-K10",  #RESIZED: "sample-K1","sample-K2","sample-K3","sample-K4","sample-K5","sample-K6",  "sample-K11","sample-K12","sample-K13","sample-K14","sample-K15",
    "sample-L1","sample-L7","sample-L8","sample-L10",  #RESIZED: "sample-L2","sample-L3","sample-L4","sample-L5","sample-L6",  "sample-L11","sample-L12","sample-L13","sample-L14","sample-L15",
    "sample-M1","sample-M2","sample-M3","sample-M4","sample-M5","sample-M6","sample-M7","sample-M8",
    "sample-N1","sample-N2","sample-N3","sample-N4","sample-N5","sample-N6","sample-N7","sample-N8","sample-N9","sample-N10",
    "sample-O1","sample-O2","sample-O3","sample-O4","sample-O5","sample-O6","sample-O7","sample-O8"
  )
  ps.ng.tax <- prune_samples(samples, ps.ng.tax)

  sample_names(ps.ng.tax)
  rank_names(ps.ng.tax)
  sample_variables(ps.ng.tax)
```

Normalize number of reads in each sample using median sequencing depth.
```{r, echo=TRUE, warning=FALSE}
# RAREFACTION
set.seed(9242)  # This will help in reproducing the filtering and nomalisation.
ps.ng.tax <- rarefy_even_depth(ps.ng.tax, sample.size = 6389)
total <- 6389

# NORMALIZE number of reads in each sample using median sequencing depth.
total = median(sample_sums(ps.ng.tax))
#> total
#[1] 42369
standf = function(x, t=total) round(t * (x / sum(x)))
ps.ng.tax = transform_sample_counts(ps.ng.tax, standf)
ps.ng.tax_rel <- microbiome::transform(ps.ng.tax, "compositional") 

saveRDS(ps.ng.tax, "./ps.ng.tax.rds")
hmp.meta <- meta(ps.ng.tax)
hmp.meta$sam_name <- rownames(hmp.meta)
```

# Prepare ps.ng.tax_rel, ps.ng.tax_abund, ps.ng.tax_abund_rel from ps.ng.tax

```{r, echo=FALSE, warning=FALSE}
#MOVE_FROM_ABOVE: The number of reads used for normalization is **`r sprintf("%.0f", total)`**. 
#A basic heatmap using the default parameters.
#  plot_heatmap(ps.ng.tax, method = "NMDS", distance = "bray")
#NOTE that giving the correct OTU numbers in the text (1%, 0.5%, ...)!!!
```

For the heatmaps, we focus on the most abundant OTUs by first converting counts to relative abundances within each sample. We then filter to retain only OTUs whose mean relative abundance across all samples exceeds 0.1% (0.001). We are left with 199 OTUs which makes the reading much more easy.

```{r, echo=FALSE, warning=FALSE}

# Custom function to plot a heatmap with the specified sample order
#plot_heatmap_custom <- function(ps, sample_order, method = "NMDS", distance = "bray") {

# --Filtering strategy 1: This filters taxa based on raw counts (ps.ng.tax). For each taxon (across samples), it checks if it has a count that exceeds 1% of the total in at least one sample.     Description: We consider the most abundant OTUs for heatmaps. For example one can only take OTUs that represent at least 1% of reads in at least one sample. Remember we normalized all the sampples to median number of reads (total).  We are left with only 382 OTUS which makes the reading much more easy.
#ps.ng.tax_abund <- phyloseq::filter_taxa(ps.ng.tax, function(x) sum(x > total*0.01) > 0, TRUE)

# --Filtering strategy 2: This filters taxa based on relative abundances (ps.ng.tax_rel). It keeps only those taxa whose mean relative abundance across samples exceeds 0.1%.
# 1) Convert to relative abundances
ps.ng.tax_rel <- transform_sample_counts(ps.ng.tax, function(x) x / sum(x))

# 2) Get the logical vector of which OTUs to keep (based on relative abundance)
keep_vector <- phyloseq::filter_taxa(
  ps.ng.tax_rel,
  function(x) mean(x) > 0.001,
  prune = FALSE
)

# 3) Use the TRUE/FALSE vector to subset absolute abundance data
ps.ng.tax_abund <- prune_taxa(names(keep_vector)[keep_vector], ps.ng.tax)

# 4) Normalize the final subset to relative abundances per sample
ps.ng.tax_abund_rel <- transform_sample_counts(
  ps.ng.tax_abund,
  function(x) x / sum(x)
)
```

# Heatmaps

```{r, echo=FALSE, warning=FALSE}
datamat_ = as.data.frame(otu_table(ps.ng.tax_abund))

#datamat <- datamat_[c("1","2","5","6","7",  "8","9","10","12","13","14",    "15","16","17","18","19","20",  "21","22","23","24","25","26","27","28",    "29","30","31","32",  "33","34","35","36","37","38","39","51",    "40","41","42","43","44","46",  "47","48","49","50","52","53","55")]
datamat <- datamat_[c(
    "sample-A1","sample-A2","sample-A3","sample-A4","sample-A5","sample-A6","sample-A7","sample-A8","sample-A9","sample-A10","sample-A11",
    "sample-B1","sample-B2","sample-B3","sample-B4","sample-B5","sample-B6","sample-B7","sample-B8","sample-B9","sample-B10","sample-B11","sample-B12","sample-B13","sample-B14","sample-B15","sample-B16",
    "sample-C1","sample-C2","sample-C3","sample-C4","sample-C5","sample-C6","sample-C7","sample-C8","sample-C9","sample-C10",
    "sample-E1","sample-E2","sample-E3","sample-E4","sample-E5","sample-E6","sample-E7","sample-E8","sample-E9","sample-E10",
    "sample-F1","sample-F2","sample-F3","sample-F4","sample-F5",
    "sample-G1","sample-G2","sample-G3","sample-G4","sample-G5","sample-G6",
    "sample-H1","sample-H2","sample-H3","sample-H4","sample-H5","sample-H6",
    "sample-I1","sample-I2","sample-I3","sample-I4","sample-I5","sample-I6",
    "sample-J1","sample-J2","sample-J3","sample-J4","sample-J10","sample-J11",  #RESIZED: "sample-J5","sample-J6","sample-J7","sample-J8","sample-J9",
    "sample-K7","sample-K8","sample-K9","sample-K10",  #RESIZED: "sample-K1","sample-K2","sample-K3","sample-K4","sample-K5","sample-K6",  "sample-K11","sample-K12","sample-K13","sample-K14","sample-K15",
    "sample-L1","sample-L7","sample-L8","sample-L10",  #RESIZED: "sample-L2","sample-L3","sample-L4","sample-L5","sample-L6",  "sample-L11","sample-L12","sample-L13","sample-L14","sample-L15",
    "sample-M1","sample-M2","sample-M3","sample-M4","sample-M5","sample-M6","sample-M7","sample-M8",
    "sample-N1","sample-N2","sample-N3","sample-N4","sample-N5","sample-N6","sample-N7","sample-N8","sample-N9","sample-N10",
    "sample-O1","sample-O2","sample-O3","sample-O4","sample-O5","sample-O6","sample-O7","sample-O8"
  )]

hr <- hclust(as.dist(1-cor(t(datamat), method="pearson")), method="complete")
hc <- hclust(as.dist(1-cor(datamat, method="spearman")), method="complete")
mycl = cutree(hr, h=max(hr$height)/1.08)
mycol = c("YELLOW", "DARKBLUE", "DARKORANGE", "DARKMAGENTA", "DARKCYAN", "DARKRED",  "MAROON", "DARKGREEN", "LIGHTBLUE", "PINK", "MAGENTA", "LIGHTCYAN","LIGHTGREEN", "BLUE", "ORANGE", "CYAN", "RED", "GREEN");

mycol = mycol[as.vector(mycl)]
sampleCols <- rep('GREY',ncol(datamat))
#names(sampleCols) <- c("Group1", "Group1", "Group1", "Group1", "Group1",   "Group2", "Group2",   "Group3", "Group3", "Group3",  ...)

# Define 14 colors
my_colors <- c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c",
                "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00",
                "#cab2d6", "#6a3d9a", "#ffff99", "#b15928",
                "#8dd3c7", "#fb8072")
# Example column names
colnames(datamat) <- c(
    "sample-A1","sample-A2","sample-A3","sample-A4","sample-A5","sample-A6","sample-A7","sample-A8","sample-A9","sample-A10","sample-A11",
    "sample-B1","sample-B2","sample-B3","sample-B4","sample-B5","sample-B6","sample-B7","sample-B8","sample-B9","sample-B10","sample-B11","sample-B12","sample-B13","sample-B14","sample-B15","sample-B16",
    "sample-C1","sample-C2","sample-C3","sample-C4","sample-C5","sample-C6","sample-C7","sample-C8","sample-C9","sample-C10",
    "sample-E1","sample-E2","sample-E3","sample-E4","sample-E5","sample-E6","sample-E7","sample-E8","sample-E9","sample-E10",
    "sample-F1","sample-F2","sample-F3","sample-F4","sample-F5",
    "sample-G1","sample-G2","sample-G3","sample-G4","sample-G5","sample-G6",
    "sample-H1","sample-H2","sample-H3","sample-H4","sample-H5","sample-H6",
    "sample-I1","sample-I2","sample-I3","sample-I4","sample-I5","sample-I6",
    "sample-J1","sample-J2","sample-J3","sample-J4","sample-J10","sample-J11",  #RESIZED: "sample-J5","sample-J6","sample-J7","sample-J8","sample-J9",
    "sample-K7","sample-K8","sample-K9","sample-K10",  #RESIZED: "sample-K1","sample-K2","sample-K3","sample-K4","sample-K5","sample-K6",  "sample-K11","sample-K12","sample-K13","sample-K14","sample-K15",
    "sample-L1","sample-L7","sample-L8","sample-L10",  #RESIZED: "sample-L2","sample-L3","sample-L4","sample-L5","sample-L6",  "sample-L11","sample-L12","sample-L13","sample-L14","sample-L15",
    "sample-M1","sample-M2","sample-M3","sample-M4","sample-M5","sample-M6","sample-M7","sample-M8",
    "sample-N1","sample-N2","sample-N3","sample-N4","sample-N5","sample-N6","sample-N7","sample-N8","sample-N9","sample-N10",
    "sample-O1","sample-O2","sample-O3","sample-O4","sample-O5","sample-O6","sample-O7","sample-O8"
  )
# (replace with your actual column names)

# Remove "sample-" prefix for easier matching
sample_names <- sub("^sample-", "", colnames(datamat))

# Create a function to match sample IDs to groups
assign_group <- function(sample_id) {
  # First letter indicates group
  prefix <- substr(sample_id, 1, 1)
  switch(prefix,
         "A" = 1,
         "B" = 2,
         "C" = 3,
         "E" = 4,
         "F" = 5,
         "G" = 6,
         "H" = 7,
         "I" = 8,
         "J" = 9,
         "K" = 10,
         "L" = 11,
         "M" = 12,
         "N" = 13,
         "O" = 14,
         NA)
}
# Assign group numbers to samples
group_numbers <- sapply(sample_names, assign_group)
# Assign colors based on group numbers
sampleCols <- my_colors[group_numbers]
# Check results
print(sampleCols)
#'#a6cee3', '#1f78b4', '#b2df8a', '#33a02c', '#fb9a99', '#e31a1c', '#cab2d6', '#6a3d9a'

#bluered(75)
#color_pattern <- colorRampPalette(c("blue", "white", "red"))(100)
library(RColorBrewer)
custom_palette <- colorRampPalette(brewer.pal(9, "Blues"))
heatmap_colors <- custom_palette(100)
#colors <- heatmap_color_default(100)
png("figures/heatmap.png", width=1200, height=2400)
#par(mar=c(2, 2, 2, 2))  , lwid=1    lhei=c(0.7, 10)) # Adjust height of color keys   keysize=0.3,
heatmap.2(as.matrix(datamat),Rowv=as.dendrogram(hr),Colv = NA, dendrogram = 'row',
            scale='row',trace='none',col=heatmap_colors, cexRow=1.2, cexCol=1.5,
            RowSideColors = mycol, ColSideColors = sampleCols, srtCol=15, labRow=row.names(datamat), key=TRUE, margins=c(10, 15), lhei=c(0.7, 15), lwid=c(1,8))
dev.off()
```
```{r, echo=TRUE, warning=FALSE, fig.cap="Heatmap", out.width = '100%', fig.align= "center"}
knitr::include_graphics("./figures/heatmap.png")
```

\pagebreak

```{r, echo=FALSE, warning=FALSE}
  library(stringr)
#FITTING1: 
#for id in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100  101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199; do
#for id in 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300; do
#for id in 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382; do
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Domain\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Domain\"], \"__\")[[1]][2]"
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Phylum\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Phylum\"], \"__\")[[1]][2]"
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Class\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Class\"], \"__\")[[1]][2]"
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Order\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Order\"], \"__\")[[1]][2]"
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Family\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Family\"], \"__\")[[1]][2]"
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Genus\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Genus\"], \"__\")[[1]][2]"
#  echo "phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Species\"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[${id},\"Species\"], \"__\")[[1]][2]"
#done

phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[1,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[2,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[3,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[4,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[5,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[6,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[7,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[8,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[9,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[10,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[11,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[12,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[13,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[14,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[15,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[16,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[17,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[18,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[19,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[20,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[21,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[22,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[23,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[24,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[25,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[26,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[27,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[28,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[29,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[30,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[31,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[32,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[33,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[34,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[35,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[36,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[37,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[38,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[39,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[40,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[41,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[42,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[43,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[44,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[45,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[46,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[47,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[48,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[49,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[50,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[51,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[52,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[53,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[54,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[55,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[56,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[57,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[58,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[59,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[60,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[61,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[62,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[63,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[64,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[65,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[66,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[67,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[68,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[69,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[70,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[71,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[72,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[73,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[74,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[75,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[76,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[77,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[78,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[79,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[80,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[81,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[82,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[83,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[84,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[85,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[86,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[87,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[88,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[89,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[90,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[91,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[92,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[93,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[94,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[95,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[96,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[97,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[98,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[99,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[100,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[101,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[102,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[103,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[104,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[105,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[106,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[107,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[108,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[109,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[110,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[111,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[112,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[113,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[114,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[115,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[116,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[117,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[118,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[119,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[120,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[121,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[122,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[123,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[124,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[125,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[126,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[127,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[128,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[129,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[130,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[131,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[132,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[133,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[134,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[135,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[136,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[137,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[138,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[139,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[140,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[141,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[142,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[143,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[144,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[145,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[146,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[147,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[148,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[149,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[150,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[151,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[152,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[153,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[154,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[155,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[156,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[157,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[158,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[159,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[160,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[161,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[162,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[163,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[164,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[165,"Species"], "__")[[1]][2]

phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[166,"Species"], "__")[[1]][2]

phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Species"], "__")[[1]][2]

phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[167,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[168,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[169,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[170,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[171,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[172,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[173,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[174,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[175,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[176,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[177,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[178,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[179,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[180,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[181,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[182,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[183,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[184,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[185,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[186,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[187,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[188,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[189,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[190,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[191,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[192,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[193,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[194,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[195,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[196,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[197,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[198,"Species"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Domain"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Domain"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Phylum"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Phylum"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Class"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Class"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Order"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Order"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Family"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Family"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Genus"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Genus"], "__")[[1]][2]
phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Species"] <- str_split(phyloseq::tax_table(ps.ng.tax_abund_rel)[199,"Species"], "__")[[1]][2]

```

# Taxonomic summary

## Bar plots in phylum level

```{r, fig.width=16, fig.height=8, echo=TRUE, warning=FALSE}
  #aes(color="Phylum", fill="Phylum") --> aes()
  #ggplot(data=data, aes(x=Sample, y=Abundance, fill=Phylum)) 
  my_colors <- c("darkblue", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "lightskyblue", "darkgreen", "deeppink", "khaki2", "firebrick", "brown1", "darkorange1", "cyan1", "royalblue4", "darksalmon", "darkblue","royalblue4", "dodgerblue3", "steelblue1", "lightskyblue", "darkseagreen", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "brown1", "darkorange1", "cyan1", "darkgrey")
  plot_bar(ps.ng.tax_abund_rel, fill="Phylum") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=2))                                  #6 instead of theme.size
```
```{r, echo=FALSE, warning=FALSE}
  #png("abc.png")
  #knitr::include_graphics("./Phyloseq_files/figure-html/unnamed-chunk-7-1.png")
  #dev.off()
```

\pagebreak
Regroup together pre vs post stroke samples and normalize number of reads in each group using median sequencing depth.

```{r, echo=TRUE, warning=FALSE}
  ps.ng.tax_abund_rel_pre_post_stroke <- merge_samples(ps.ng.tax_abund_rel, "pre_post_stroke")
  #PENDING: The effect weighted twice by sum(x), is the same to the effect weighted once directly from absolute abundance?!
  ps.ng.tax_abund_rel_pre_post_stroke_ = transform_sample_counts(ps.ng.tax_abund_rel_pre_post_stroke, function(x) x / sum(x))
  #plot_bar(ps.ng.tax_abund_relSampleType_, fill = "Phylum") + geom_bar(aes(color=Phylum, fill=Phylum), stat="identity", position="stack")
  plot_bar(ps.ng.tax_abund_rel_pre_post_stroke_, fill="Phylum") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"))
```

```{r, echo=FALSE, warning=FALSE}

  #FITTING6: regulate the bar height if it has replicates: 11+16+10+10+5+6+6+6+11+15+14+8+10+8=136

  ps.ng.tax_abund_rel_weighted <- data.table::copy(ps.ng.tax_abund_rel)

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A1")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A10")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A11")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A11")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A2")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A3")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A4")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A5")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A6")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A7")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A8")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-A9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-A9")]/11

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B1")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B10")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B11")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B11")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B12")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B12")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B13")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B13")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B14")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B14")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B15")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B15")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B16")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B16")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B2")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B3")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B4")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B5")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B6")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B7")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B8")]/16
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-B9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-B9")]/16

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C1")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C10")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C2")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C3")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C4")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C5")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C6")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C7")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C8")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-C9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-C9")]/10

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E1")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E10")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E2")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E3")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E4")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E5")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E6")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E7")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E8")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-E9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-E9")]/10

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-F1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-F1")]/5
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-F2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-F2")]/5
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-F3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-F3")]/5
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-F4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-F4")]/5
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-F5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-F5")]/5

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-G1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-G1")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-G2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-G2")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-G3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-G3")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-G4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-G4")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-G5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-G5")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-G6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-G6")]/6

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-H1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-H1")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-H2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-H2")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-H3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-H3")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-H4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-H4")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-H5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-H5")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-H6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-H6")]/6

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-I1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-I1")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-I2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-I2")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-I3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-I3")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-I4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-I4")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-I5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-I5")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-I6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-I6")]/6

  #RESIZED:
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J1")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J2")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J3")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J4")]/6
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J5")]/11
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J6")]/11
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J7")]/11
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J8")]/11
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J9")]/11
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J10")]/6
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-J11")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-J11")]/6

  #RESIZED:
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K1")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K2")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K3")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K4")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K5")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K6")]/15
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K7")]/4
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K8")]/4
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K9")]/4
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K10")]/4
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K11")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K11")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K12")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K12")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K13")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K13")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K14")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K14")]/15
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-K15")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-K15")]/15

  #RESIZED:
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L1")]/4
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L2")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L3")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L4")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L5")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L6")]/14
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L7")]/4
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L8")]/4
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L10")]/4
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L11")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L11")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L12")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L12")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L13")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L13")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L14")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L14")]/14
  #otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-L15")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-L15")]/14

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M1")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M2")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M3")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M4")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M5")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M6")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M7")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-M8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-M8")]/8

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N1")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N10")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N10")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N2")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N3")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N4")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N5")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N6")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N7")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N8")]/10
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-N9")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-N9")]/10

  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O1")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O1")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O2")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O2")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O3")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O3")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O4")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O4")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O5")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O5")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O6")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O6")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O7")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O7")]/8
  otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O8")] <- otu_table(ps.ng.tax_abund_rel)[,c("sample-O8")]/8

  sum(otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O1")])
  #[1] 0.125
  sum(otu_table(ps.ng.tax_abund_rel)[,c("sample-O1")])
  #[1] 1
```

\pagebreak
Use color according to phylum. Do separate panels Stroke and Sex_age.

```{r, echo=FALSE, warning=FALSE}
  #plot_bar(ps.ng.tax_abund_relswab_, x="Phylum", fill = "Phylum", facet_grid = Patient~RoundDay) + geom_bar(aes(color=Phylum, fill=Phylum), stat="identity", position="stack") + theme(axis.text = element_text(size = theme.size, colour="black"))
  plot_bar(ps.ng.tax_abund_rel_weighted, x="Phylum", fill="Phylum", facet_grid = pre_post_stroke~Sex_age) + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"), axis.text.x=element_blank(), axis.ticks=element_blank()) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=2))
```

## Bar plots in class level

```{r, fig.width=16, fig.height=8, echo=TRUE, warning=FALSE}
  my_colors <- c("darkblue", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "lightskyblue", "darkgreen", "deeppink", "khaki2", "firebrick", "brown1", "darkorange1", "cyan1", "royalblue4", "darksalmon", "darkblue","royalblue4", "dodgerblue3", "steelblue1", "lightskyblue", "darkseagreen", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "brown1", "darkorange1", "cyan1", "darkgrey")
  plot_bar(ps.ng.tax_abund_rel, fill="Class") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=3))
```

Regroup together pre vs post stroke samples and normalize number of reads in each group using median sequencing depth.
```{r, echo=TRUE, warning=FALSE}
  plot_bar(ps.ng.tax_abund_rel_pre_post_stroke_, fill="Class") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"))
```
\pagebreak

Use color according to class. Do separate panels Stroke and Sex_age.
```{r, echo=TRUE, warning=FALSE}
  #NOTE: MANALLY RUNNING the CODE by COPYING the CODE under R-console for the 6 blocks, then show them with knitr::include_graphics
  sum(otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O1")])
  plot_bar(ps.ng.tax_abund_rel_weighted, x="Class", fill="Class", facet_grid = pre_post_stroke~Sex_age) + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"), axis.text.x=element_blank(), axis.ticks=element_blank()) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=3))
```

```{r, echo=FALSE, warning=FALSE}
        ps_df <- phyloseq::psmelt(ps.ng.tax_abund_rel_weighted)
        # 准备数据
        ps_summary <- ps_df %>%
          # 1. 只保留这三个 condition
          filter(pre_post_stroke %in% c("pre.antibiotics", "baseline", "pre.stroke")) %>%

          # 2. 聚合
          group_by(Sex_age, pre_post_stroke, Class) %>%
          summarise(Abundance = sum(Abundance), .groups = "drop") %>%

          # 3. 设置 factor 顺序和重命名
          mutate(
            # 替换 Sex_age 名称
            Sex_age = recode(Sex_age,
                            "female.aged" = "Female (Aged)",
                            "male.aged"   = "Male (Aged)",
                            "male.young"  = "Male (Young)"),
            Sex_age = factor(Sex_age, levels = c("Male (Aged)", "Female (Aged)", "Male (Young)")),

            # 替换 condition 名称
            pre_post_stroke = recode(pre_post_stroke,
                                    "pre.antibiotics" = "Before Antibiotics",
                                    "baseline"        = "Baseline",
                                    "pre.stroke"      = "Before Stroke"),
            pre_post_stroke = factor(pre_post_stroke,
                                    levels = c("Before Antibiotics", "Baseline", "Before Stroke")),

            Class = factor(Class)
          )

        # 确保颜色数匹配
        class_levels <- levels(ps_summary$Class)
        color_map <- setNames(my_colors[seq_along(class_levels)], class_levels)

        # 绘图
        p <- ggplot(ps_summary, aes(x = Sex_age, y = Abundance, fill = Class)) +
          geom_bar(stat = "identity", position = "stack", width = 0.55) +  # 更窄的柱子
          facet_grid(pre_post_stroke ~ ., scales = "free_x", drop = TRUE) +
          scale_fill_manual(values = color_map, drop = FALSE) +
          theme_minimal(base_size = 11) +
          theme(
            axis.text.x = element_text(angle = 45, hjust = 1, size = 9, colour = "black"),
            axis.title = element_text(size = 11),
            strip.text = element_text(size = 10, face = "bold"),
            legend.position = "right",               # ✅ legend 放右边
            legend.title = element_blank(),
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank()
          ) +
          guides(fill = guide_legend(ncol = 1)) +     # 竖排图例
          labs(
            x = "Sex and Age Group",
            y = "Relative Abundance",
            title = "Taxonomic Class Composition by Group and Condition"
          )

        # 保存为 PNG 文件
        ggsave(
          filename = "./figures/Separate_Stroke_and_SexAge_on_Class.png",
          plot = p,
          width = 8,
          height = 6,
          dpi = 200
        )

        knitr::include_graphics("./figures/Separate_Stroke_and_SexAge_on_Class.png")
```

```{r, echo=FALSE, warning=FALSE}
        ps_df <- phyloseq::psmelt(ps.ng.tax_abund_rel_weighted)
        # 数据处理，只保留 "Before Stroke"
        ps_summary <- ps_df %>%
          filter(pre_post_stroke == "pre.stroke") %>%
          group_by(Sex_age, Class) %>%
          summarise(Abundance = sum(Abundance), .groups = "drop") %>%
          mutate(
            Sex_age = recode(Sex_age,
              "female.aged" = "Female (Aged)",
              "male.aged" = "Male (Aged)",
              "male.young" = "Male (Young)"
            ),
            Sex_age = factor(Sex_age, levels = c("Male (Aged)", "Female (Aged)", "Male (Young)")),
            Class = factor(Class)
          )

        # 映射颜色
        class_levels <- levels(ps_summary$Class)
        color_map <- setNames(my_colors[seq_along(class_levels)], class_levels)

        # 绘图
        p <- ggplot(ps_summary, aes(x = Sex_age, y = Abundance, fill = Class)) +
          geom_bar(stat = "identity", position = "stack", width = 0.55) +
          scale_fill_manual(values = color_map, drop = FALSE) +
          theme_minimal(base_size = 11) +
          theme(
            axis.text.x = element_text(angle = 45, hjust = 1, size = 9, colour = "black"),
            axis.title = element_text(size = 11),
            legend.position = "right",
            legend.title = element_blank(),
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank()
          ) +
          labs(
            x = "Sex and Age Group",
            y = "Relative Abundance",
            title = "Class Composition - Before Stroke"
          ) +
          guides(fill = guide_legend(ncol = 2))

        # 保存图像
        ggsave(
          filename = "./figures/Before_Stroke_Class_Composition.png",
          plot = p,
          width = 8,
          height = 5,
          dpi = 200
        )

        # 插入图像到报告
        knitr::include_graphics("./figures/Before_Stroke_Class_Composition.png")
```

## Bar plots in order level

```{r, fig.width=16, fig.height=8, echo=TRUE, warning=FALSE}
  my_colors <- c("darkblue", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "lightskyblue", "darkgreen", "deeppink", "khaki2", "firebrick", "brown1", "darkorange1", "cyan1", "royalblue4", "darksalmon", "darkblue","royalblue4", "dodgerblue3", "steelblue1", "lightskyblue", "darkseagreen", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "brown1", "darkorange1", "cyan1", "darkgrey")
  plot_bar(ps.ng.tax_abund_rel, fill="Order") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=4))
```

Regroup together pre vs post stroke and normalize number of reads in each group using median sequencing depth.
```{r, echo=TRUE, warning=FALSE}
  plot_bar(ps.ng.tax_abund_rel_pre_post_stroke_, fill="Order") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"))
```
\pagebreak

Use color according to order. Do separate panels Stroke and Sex_age.
```{r, echo=FALSE, warning=FALSE}

  #FITTING7: regulate the bar height if it has replicates
  sum(otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O1")])
  plot_bar(ps.ng.tax_abund_rel_weighted, x="Order", fill="Order", facet_grid = pre_post_stroke~Sex_age) + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"), axis.text.x=element_blank(), axis.ticks=element_blank()) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=4))
```

```{r, echo=FALSE, warning=FALSE}
        ps_df <- phyloseq::psmelt(ps.ng.tax_abund_rel_weighted)
        # 准备数据
        ps_summary <- ps_df %>%
          # 1. 只保留这三个 condition
          filter(pre_post_stroke %in% c("pre.antibiotics", "baseline", "pre.stroke")) %>%

          # 2. 聚合
          group_by(Sex_age, pre_post_stroke, Order) %>%
          summarise(Abundance = sum(Abundance), .groups = "drop") %>%

          # 3. 设置 factor 顺序和重命名
          mutate(
            # 替换 Sex_age 名称
            Sex_age = recode(Sex_age,
                            "female.aged" = "Female (Aged)",
                            "male.aged"   = "Male (Aged)",
                            "male.young"  = "Male (Young)"),
            Sex_age = factor(Sex_age, levels = c("Male (Aged)", "Female (Aged)", "Male (Young)")),

            # 替换 condition 名称
            pre_post_stroke = recode(pre_post_stroke,
                                    "pre.antibiotics" = "Before Antibiotics",
                                    "baseline"        = "Baseline",
                                    "pre.stroke"      = "Before Stroke"),
            pre_post_stroke = factor(pre_post_stroke,
                                    levels = c("Before Antibiotics", "Baseline", "Before Stroke")),

            Order = factor(Order)
          )

        # 确保颜色数匹配
        class_levels <- levels(ps_summary$Order)
        color_map <- setNames(my_colors[seq_along(class_levels)], class_levels)

        # 绘图
        p <- ggplot(ps_summary, aes(x = Sex_age, y = Abundance, fill = Order)) +
          geom_bar(stat = "identity", position = "stack", width = 0.55) +  # 更窄的柱子
          facet_grid(pre_post_stroke ~ ., scales = "free_x", drop = TRUE) +
          scale_fill_manual(values = color_map, drop = FALSE) +
          theme_minimal(base_size = 11) +
          theme(
            axis.text.x = element_text(angle = 45, hjust = 1, size = 9, colour = "black"),
            axis.title = element_text(size = 11),
            strip.text = element_text(size = 10, face = "bold"),
            legend.position = "right",               # ✅ legend 放右边
            legend.title = element_blank(),
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank()
          ) +
          guides(fill = guide_legend(ncol = 1)) +     # 竖排图例
          labs(
            x = "Sex and Age Group",
            y = "Relative Abundance",
            title = "Taxonomic Order Composition by Group and Condition"
          )

        # 保存为 PNG 文件
        ggsave(
          filename = "./figures/Separate_Stroke_and_SexAge_on_Order.png",
          plot = p,
          width = 8,
          height = 6,
          dpi = 200
        )

        knitr::include_graphics("./figures/Separate_Stroke_and_SexAge_on_Order.png")
```

```{r, echo=FALSE, warning=FALSE}
        ps_df <- phyloseq::psmelt(ps.ng.tax_abund_rel_weighted)
        # 数据处理，只保留 "Before Stroke"
        ps_summary <- ps_df %>%
          filter(pre_post_stroke == "pre.stroke") %>%
          group_by(Sex_age, Order) %>%
          summarise(Abundance = sum(Abundance), .groups = "drop") %>%
          mutate(
            Sex_age = recode(Sex_age,
              "female.aged" = "Female (Aged)",
              "male.aged" = "Male (Aged)",
              "male.young" = "Male (Young)"
            ),
            Sex_age = factor(Sex_age, levels = c("Male (Aged)", "Female (Aged)", "Male (Young)")),
            Order = factor(Order)
          )

        # 映射颜色
        class_levels <- levels(ps_summary$Order)
        color_map <- setNames(my_colors[seq_along(class_levels)], class_levels)

        # 绘图
        p <- ggplot(ps_summary, aes(x = Sex_age, y = Abundance, fill = Order)) +
          geom_bar(stat = "identity", position = "stack", width = 0.55) +
          scale_fill_manual(values = color_map, drop = FALSE) +
          theme_minimal(base_size = 11) +
          theme(
            axis.text.x = element_text(angle = 45, hjust = 1, size = 9, colour = "black"),
            axis.title = element_text(size = 11),
            legend.position = "right",
            legend.title = element_blank(),
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank()
          ) +
          labs(
            x = "Sex and Age Group",
            y = "Relative Abundance",
            title = "Order Composition - Before Stroke"
          ) +
          guides(fill = guide_legend(ncol = 2))

        # 保存图像
        ggsave(
          filename = "./figures/Before_Stroke_Order_Composition.png",
          plot = p,
          width = 8,
          height = 5,
          dpi = 200
        )

        # 插入图像到报告
        knitr::include_graphics("./figures/Before_Stroke_Order_Composition.png")
```

## Bar plots in family level

```{r, fig.width=16, fig.height=8, echo=TRUE, warning=FALSE}
  my_colors <- c(
          "#FF0000", "#000000", "#0000FF", "#C0C0C0", "#FFFFFF", "#FFFF00", "#00FFFF", "#FFA500", "#00FF00", "#808080", "#FF00FF", "#800080", "#FDD017", "#0000A0", "#3BB9FF", "#008000", "#800000", "#ADD8E6", "#F778A1", "#800517", "#736F6E", "#F52887", "#C11B17", "#5CB3FF", "#A52A2A", "#FF8040", "#2B60DE", "#736AFF", "#1589FF", "#98AFC7", "#8D38C9", "#307D7E", "#F6358A", "#151B54", "#6D7B8D", "#FDEEF4", "#FF0080", "#F88017", "#2554C7", "#FFF8C6", "#D4A017", "#306EFF", "#151B8D", "#9E7BFF", "#EAC117", "#E0FFFF", "#15317E", "#6C2DC7", "#FBB917", "#FCDFFF", "#15317E", "#254117", "#FAAFBE", "#357EC7"
        )
  plot_bar(ps.ng.tax_abund_rel, fill="Family") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=8))
```

Regroup together pre vs post stroke samples and normalize number of reads in each group using median sequencing depth.
```{r, echo=TRUE, warning=FALSE}
  plot_bar(ps.ng.tax_abund_rel_pre_post_stroke_, fill="Family") + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"))
```
\pagebreak

Use color according to family. Do separate panels Stroke and Sex_age.
```{r, echo=TRUE, warning=FALSE}
  sum(otu_table(ps.ng.tax_abund_rel_weighted)[,c("sample-O1")])
  plot_bar(ps.ng.tax_abund_rel_weighted, x="Family", fill="Family", facet_grid = pre_post_stroke~Sex_age) + geom_bar(aes(), stat="identity", position="stack") +
  scale_fill_manual(values = my_colors) + theme(axis.text = element_text(size = 7, colour="black"), axis.text.x=element_blank(), axis.ticks=element_blank()) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=8))
```

```{r, echo=FALSE, warning=FALSE}
        ps_df <- phyloseq::psmelt(ps.ng.tax_abund_rel_weighted)
        # 准备数据
        ps_summary <- ps_df %>%
          # 1. 只保留这三个 condition
          filter(pre_post_stroke %in% c("pre.antibiotics", "baseline", "pre.stroke")) %>%

          # 2. 聚合
          group_by(Sex_age, pre_post_stroke, Family) %>%
          summarise(Abundance = sum(Abundance), .groups = "drop") %>%

          # 3. 设置 factor 顺序和重命名
          mutate(
            # 替换 Sex_age 名称
            Sex_age = recode(Sex_age,
                            "female.aged" = "Female (Aged)",
                            "male.aged"   = "Male (Aged)",
                            "male.young"  = "Male (Young)"),
            Sex_age = factor(Sex_age, levels = c("Male (Aged)", "Female (Aged)", "Male (Young)")),

            # 替换 condition 名称
            pre_post_stroke = recode(pre_post_stroke,
                                    "pre.antibiotics" = "Before Antibiotics",
                                    "baseline"        = "Baseline",
                                    "pre.stroke"      = "Before Stroke"),
            pre_post_stroke = factor(pre_post_stroke,
                                    levels = c("Before Antibiotics", "Baseline", "Before Stroke")),

            Family = factor(Family)
          )

        # 确保颜色数匹配
        class_levels <- levels(ps_summary$Family)
        color_map <- setNames(my_colors[seq_along(class_levels)], class_levels)

        # 绘图
        p <- ggplot(ps_summary, aes(x = Sex_age, y = Abundance, fill = Family)) +
          geom_bar(stat = "identity", position = "stack", width = 0.55) +  # 更窄的柱子
          facet_grid(pre_post_stroke ~ ., scales = "free_x", drop = TRUE) +
          scale_fill_manual(values = color_map, drop = FALSE) +
          theme_minimal(base_size = 11) +
          theme(
            axis.text.x = element_text(angle = 45, hjust = 1, size = 9, colour = "black"),
            axis.title = element_text(size = 11),
            strip.text = element_text(size = 10, face = "bold"),
            legend.position = "right",               # ✅ legend 放右边
            legend.title = element_blank(),
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank()
          ) +
          guides(fill = guide_legend(ncol = 2)) +     # 竖排图例
          labs(
            x = "Sex and Age Group",
            y = "Relative Abundance",
            title = "Taxonomic Family Composition by Group and Condition"
          )

        # 保存为 PNG 文件
        ggsave(
          filename = "./figures/Separate_Stroke_and_SexAge_on_Family.png",
          plot = p,
          width = 9,
          height = 6,
          dpi = 200
        )

        knitr::include_graphics("./figures/Separate_Stroke_and_SexAge_on_Family.png")
```

```{r, echo=FALSE, warning=FALSE}
        ps_df <- phyloseq::psmelt(ps.ng.tax_abund_rel_weighted)
        # 数据处理，只保留 "Before Stroke"
        ps_summary <- ps_df %>%
          filter(pre_post_stroke == "pre.stroke") %>%
          group_by(Sex_age, Family) %>%
          summarise(Abundance = sum(Abundance), .groups = "drop") %>%
          mutate(
            Sex_age = recode(Sex_age,
              "female.aged" = "Female (Aged)",
              "male.aged" = "Male (Aged)",
              "male.young" = "Male (Young)"
            ),
            Sex_age = factor(Sex_age, levels = c("Male (Aged)", "Female (Aged)", "Male (Young)")),
            Family = factor(Family)
          )

        # 映射颜色
        class_levels <- levels(ps_summary$Family)
        color_map <- setNames(my_colors[seq_along(class_levels)], class_levels)

        # 绘图
        p <- ggplot(ps_summary, aes(x = Sex_age, y = Abundance, fill = Family)) +
          geom_bar(stat = "identity", position = "stack", width = 0.55) +
          scale_fill_manual(values = color_map, drop = FALSE) +
          theme_minimal(base_size = 11) +
          theme(
            axis.text.x = element_text(angle = 45, hjust = 1, size = 9, colour = "black"),
            axis.title = element_text(size = 11),
            legend.position = "right",
            legend.title = element_blank(),
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank()
          ) +
          labs(
            x = "Sex and Age Group",
            y = "Relative Abundance",
            title = "Family Composition - Before Stroke"
          ) +
          guides(fill = guide_legend(ncol = 2))

        # 保存图像
        ggsave(
          filename = "./figures/Before_Stroke_Family_Composition.png",
          plot = p,
          width = 8,
          height = 5,
          dpi = 200
        )

        # 插入图像到报告
        knitr::include_graphics("./figures/Before_Stroke_Family_Composition.png")
```

\pagebreak

# Alpha diversity
Plot Chao1 richness estimator, Observed OTUs, Shannon index, and Phylogenetic diversity. 
Regroup together samples from the same group.
```{r, echo=FALSE, warning=FALSE}
# using rarefied data
#FITTING2: CONSOLE: 
#gunzip table_even4753.biom.gz
#alpha_diversity.py -i table_even42369.biom --metrics chao1,observed_otus,shannon,PD_whole_tree -o adiv_even.txt -t ../clustering/rep_set.tre
#gunzip table_even4753.biom.gz
#alpha_diversity.py -i table_even4753.biom --metrics chao1,observed_otus,shannon,PD_whole_tree -o adiv_even.txt -t ../clustering_stool/rep_set.tre
#gunzip table_even4753.biom.gz
#alpha_diversity.py -i table_even4753.biom --metrics chao1,observed_otus,shannon,PD_whole_tree -o adiv_even.txt -t ../clustering_swab/rep_set.tre
```

```{r, echo=TRUE, warning=FALSE}
#fig.width=9, fig.height=6,
#QIIME1 hmp.div_qiime <- read.csv("adiv_even.txt", sep="\t")
#QIIME1 colnames(hmp.div_qiime) <- c("sam_name", "chao1", "observed_otus", "shannon", "PD_whole_tree")
#QIIME1 row.names(hmp.div_qiime) <- hmp.div_qiime$sam_name
#QIIME1 div.df <- merge(hmp.div_qiime, hmp.meta, by = "sam_name")
#QIIME1 div.df2 <- div.df[, c("Group", "chao1", "shannon", "observed_otus", "PD_whole_tree")]
#QIIME1 colnames(div.df2) <- c("Group", "Chao-1", "Shannon", "OTU", "Phylogenetic Diversity")
#QIIME1 options(max.print=999999)
#QIIME1 #27     H47 830.5000 5.008482 319               10.60177
#QIIME1 #FITTING4: if occuring "Computation failed in `stat_signif()`:not enough 'y' observations"
#QIIME1 #means: the patient H47 contains only one sample, it should be removed for the statistical p-values calculations.
#QIIME1 #delete H47(1)
#QIIME1 #div.df2 <- div.df2[-c(3), ]
#QIIME1 #div.df2 <- div.df2[-c(55,54, 45,40,39,27,26,25,1), ]

# for QIIME2: Lesen der Metriken
shannon <- read.table("exported_alpha/shannon/alpha-diversity.tsv", header=TRUE, sep="\t")
faith_pd <- read.table("exported_alpha/faith_pd/alpha-diversity.tsv", header=TRUE, sep="\t")
observed <- read.table("exported_alpha/observed_features/alpha-diversity.tsv", header=TRUE, sep="\t")
#chao1 <- read.table("exported_alpha/chao1/alpha-diversity.tsv", header=TRUE, sep="\t")    #TODO: Check the correctness of chao1-calculation.

# Umbenennen für Klarheit
colnames(shannon) <- c("sam_name", "shannon")
colnames(faith_pd) <- c("sam_name", "PD_whole_tree")
colnames(observed) <- c("sam_name", "observed_otus")
#colnames(chao1) <- c("sam_name", "chao1")

# Merge alles in ein DataFrame
div.df <- Reduce(function(x, y) merge(x, y, by="sam_name"),
                  list(shannon, faith_pd, observed))

# Meta-Daten einfügen
div.df <- merge(div.df, hmp.meta, by="sam_name")

# Reformat
div.df2 <- div.df[, c("sam_name", "Group", "shannon", "observed_otus", "PD_whole_tree")]
colnames(div.df2) <- c("Sample name", "Group", "Shannon", "OTU", "Phylogenetic Diversity")
write.csv(div.df2, file="alpha_diversities.txt")
knitr::kable(div.df2) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

#https://uc-r.github.io/t_test
#We can perform the test with t.test and transform our data and we can also perform the nonparametric test with the wilcox.test function.
stat.test.Shannon <- compare_means(
 Shannon ~ Group, data = div.df2,
 method = "t.test"
)
knitr::kable(stat.test.Shannon) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

div_df_melt <- reshape2::melt(div.df2)
#head(div_df_melt)

#https://plot.ly/r/box-plots/#horizontal-boxplot
#http://www.sthda.com/english/wiki/print.php?id=177
#https://rpkgs.datanovia.com/ggpubr/reference/as_ggplot.html
#http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/82-ggplot2-easy-way-to-change-graphical-parameters/
#https://plot.ly/r/box-plots/#horizontal-boxplot
#library("gridExtra")
#par(mfrow=c(4,1))
p <- ggboxplot(div_df_melt, x = "Group", y = "value",
              facet.by = "variable", 
              scales = "free",
              width = 0.5,
              fill = "gray", legend= "right")
#ggpar(p, xlab = FALSE, ylab = FALSE)
lev <- levels(factor(div_df_melt$Group)) # get the variables
#FITTING4: delete H47(1) in lev
#lev <- lev[-c(3)]
# make a pairwise list that we want to compare.
#my_stat_compare_means
#https://stackoverflow.com/questions/47839988/indicating-significance-with-ggplot2-in-a-boxplot-with-multiple-groups
L.pairs <- combn(seq_along(lev), 2, simplify = FALSE, FUN = function(i) lev[i]) #%>% filter(p.signif != "ns")
my_stat_compare_means  <- function (mapping = NULL, data = NULL, method = NULL, paired = FALSE, 
    method.args = list(), ref.group = NULL, comparisons = NULL, 
    hide.ns = FALSE, label.sep = ", ", label = NULL, label.x.npc = "left", 
    label.y.npc = "top", label.x = NULL, label.y = NULL, tip.length = 0.03, 
    symnum.args = list(), geom = "text", position = "identity", 
    na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) 
{
    if (!is.null(comparisons)) {
        method.info <- ggpubr:::.method_info(method)
        method <- method.info$method
        method.args <- ggpubr:::.add_item(method.args, paired = paired)
        if (method == "wilcox.test") 
            method.args$exact <- FALSE
        pms <- list(...)
        size <- ifelse(is.null(pms$size), 0.3, pms$size)
        color <- ifelse(is.null(pms$color), "black", pms$color)
        map_signif_level <- FALSE
        if (is.null(label)) 
            label <- "p.format"
        if (ggpubr:::.is_p.signif_in_mapping(mapping) | (label %in% "p.signif")) {
            if (ggpubr:::.is_empty(symnum.args)) {
                map_signif_level <- c(`****` = 1e-04, `***` = 0.001, 
                  `**` = 0.01, `*` = 0.05, ns = 1)
            } else {
               map_signif_level <- symnum.args
            } 
            if (hide.ns) 
                names(map_signif_level)[5] <- " "
        }
        step_increase <- ifelse(is.null(label.y), 0.12, 0)
        ggsignif::geom_signif(comparisons = comparisons, y_position = label.y, 
            test = method, test.args = method.args, step_increase = step_increase, 
            size = size, color = color, map_signif_level = map_signif_level, 
            tip_length = tip.length, data = data)
    } else {
        mapping <- ggpubr:::.update_mapping(mapping, label)
        layer(stat = StatCompareMeans, data = data, mapping = mapping, 
            geom = geom, position = position, show.legend = show.legend, 
            inherit.aes = inherit.aes, params = list(label.x.npc = label.x.npc, 
                label.y.npc = label.y.npc, label.x = label.x, 
                label.y = label.y, label.sep = label.sep, method = method, 
                method.args = method.args, paired = paired, ref.group = ref.group, 
                symnum.args = symnum.args, hide.ns = hide.ns, 
                na.rm = na.rm, ...))
    }
}

# Rotate the x-axis labels to 45 degrees and adjust their position
p <- p + theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust=1, size=8))
#comparisons = list(c("Group1", "Group2"), c("Group3", "Group4")),
p2 <- p + 
stat_compare_means(
  method="t.test",
  comparisons = list(),
  label = "p.signif",
  symnum.args <- list(cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "ns"))
)
#comparisons = L.pairs,
#symnum.args <- list(cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05), symbols = c("****", "***", "**", "*")),
#stat_pvalue_manual
print(p2)
#https://stackoverflow.com/questions/20500706/saving-multiple-ggplots-from-ls-into-one-and-separate-files-in-r
#FITTING3: mkdir figures
ggsave("./figures/alpha_diversity_Group.png", device="png", height = 10, width = 15)
ggsave("./figures/alpha_diversity_Group.svg", device="svg", height = 10, width = 15)

#NOTE: Run this Phyloseq.Rmd, then run the code of MicrobiotaProcess.R to manually generate PCoA.png, then run this Phyloseq.Rmd!
#NOTE: AT_FIRST_DEACTIVATE_THIS_LINE: knitr::include_graphics("./figures/PCoA.png")

```

```{r, echo=FALSE, warning=FALSE, fig.cap="Alpha diversity", out.width = '100%', fig.align= "center"}
## MANUALLY selected alpha diversities unter host-env after 'cp alpha_diversities.txt selected_alpha_diversities.txt'
#knitr::include_graphics("./figures/alpha_diversity_Group.png")
#selected_alpha_diversities<-read.csv("selected_alpha_diversities.txt",sep="\t")
#knitr::kable(selected_alpha_diversities) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

# Beta diversity (Bray-Curtis distance)

## Group1 vs Group2
```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#fig.cap="Beta diversity",

#for QIIME1: file:///home/jhuang/DATA/Data_Marius_16S/core_diversity_e42369/bdiv_even42369_Group/weighted_unifrac_boxplots/Group_Stats.txt

# -- for QIIME2: MANUALLY filter permanova-pairwise.csv and save as permanova-pairwise_.csv
# #grep "Permutations" exported_beta_group/permanova-pairwise.csv > permanova-pairwise_.csv
# #grep "Group1,Group2" exported_beta_group/permanova-pairwise.csv >> permanova-pairwise_.csv
# #grep "Group3,Group4" exported_beta_group/permanova-pairwise.csv >> permanova-pairwise_.csv
# beta_diversity_group_stats<-read.csv("permanova-pairwise_.csv",sep=",")
# #beta_diversity_group_stats <- beta_diversity_group_stats[beta_diversity_group_stats$Group.1 == "Group1" & beta_diversity_group_stats$Group.2 == "Group2", ]
# #beta_diversity_group_stats <- beta_diversity_group_stats[beta_diversity_group_stats$Group.1 == "Group3" & beta_diversity_group_stats$Group.2 == "Group4", ]
# knitr::kable(beta_diversity_group_stats) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

#NOTE: Run this Phyloseq.Rmd, then run the code of MicrobiotaProcess.R to manually generate Comparison_of_Bray_Distances_Group1_vs_Group2.png and Comparison_of_Bray_Distances_Group3_vs_Group4.png, then run this Phyloseq.Rmd!

#knitr::include_graphics("./figures/Comparison_of_Bray_Distances_Group1_vs_Group2.png")

```

## Group3 vs Group4
```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#knitr::include_graphics("./figures/Comparison_of_Bray_Distances_Group3_vs_Group4.png")
```

# The PCoA analysis

## Group1 vs Group2
```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#knitr::include_graphics("./figures/PCoA2_Group1_vs_Group2.png")
```

## Group3 vs Group4
```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#knitr::include_graphics("./figures/PCoA2_Group3_vs_Group4.png")
```

## Groups 1, 2, 3 and 4
```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#knitr::include_graphics("./figures/PCoA2_Group1-Group4.png")
```

## Groups 9,10, 11, 12,13, and 14
```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#knitr::include_graphics("./figures/PCoA2_Group9-Group14.png")
```

# Differential abundance analysis

Differential abundance analysis aims to find the differences in the abundance of each taxa between two groups of samples, assigning a significance value to each comparison.

## Group1 vs Group2

```{r, echo=TRUE, warning=FALSE}
#ps.ng.tax [ 2633 taxa and 136 samples] and ps.ng.tax_abund (absolute abundance)  [382 taxa and 136 samples],  ps.ng.tax_abund_rel (relative abundance)  [382 taxa and 136 samples], either ps.ng.tax and ps.ng.tax_abund can be used here!
ps.ng.tax_abund_sel1 <- data.table::copy(ps.ng.tax_abund)
otu_table(ps.ng.tax_abund_sel1) <- otu_table(ps.ng.tax_abund)[,c("sample-A1","sample-A2","sample-A3","sample-A4","sample-A5","sample-A6","sample-A7","sample-A8","sample-A9","sample-A10","sample-A11",   "sample-B1","sample-B2","sample-B3","sample-B4","sample-B5","sample-B6","sample-B7","sample-B8","sample-B9","sample-B10","sample-B11","sample-B12","sample-B13","sample-B14","sample-B15","sample-B16")]
diagdds = phyloseq_to_deseq2(ps.ng.tax_abund_sel1, ~Group)
diagdds$Group <- relevel(diagdds$Group, "Group2")
diagdds = DESeq(diagdds, test="Wald", fitType="parametric")
resultsNames(diagdds)

res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.05
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(phyloseq::tax_table(ps.ng.tax_abund_sel1)[rownames(sigtab), ], "matrix"))
#sigtab <- sigtab[rownames(sigtab) %in% rownames(phyloseq::tax_table(ps.ng.tax_abund)), ]
kable(sigtab) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

library("ggplot2")
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
    scale_fill_brewer(palette = palname, ...)
}
x = tapply(sigtab$log2FoldChange, sigtab$Order, function(x) max(x))
x = sort(x)
sigtab$Order = factor(as.character(sigtab$Order), levels=names(x))
x = tapply(sigtab$log2FoldChange, sigtab$Family, function(x) max(x))
x = sort(x)
sigtab$Family = factor(as.character(sigtab$Family), levels=names(x))
ggplot(sigtab, aes(x=log2FoldChange, y=Family, color=Order)) + geom_point(aes(size=padj)) + scale_size_continuous(name="padj",range=c(8,4))+
  theme(axis.text.x = element_text(angle = -25, hjust = 0, vjust=0.5))
```

```{r, echo=FALSE, warning=FALSE, out.width = '100%', fig.align= "center"}
#knitr::include_graphics("./figures/diff_analysis_Group1_vs_Group2.png")
```

## Group3 vs Group4

```{r, echo=TRUE, warning=FALSE}
ps.ng.tax_abund_sel2 <- data.table::copy(ps.ng.tax_abund)
otu_table(ps.ng.tax_abund_sel2) <- otu_table(ps.ng.tax_abund)[,c("sample-C1","sample-C2","sample-C3","sample-C4","sample-C5","sample-C6","sample-C7","sample-C8","sample-C9","sample-C10",   "sample-E1","sample-E2","sample-E3","sample-E4","sample-E5","sample-E6","sample-E7","sample-E8","sample-E9","sample-E10")]
diagdds = phyloseq_to_deseq2(ps.ng.tax_abund_sel2, ~Group)
diagdds$Group <- relevel(diagdds$Group, "Group4")
diagdds = DESeq(diagdds, test="Wald", fitType="parametric")
resultsNames(diagdds)

res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.05
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(phyloseq::tax_table(ps.ng.tax_abund_sel2)[rownames(sigtab), ], "matrix"))

kable(sigtab) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

library("ggplot2")
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
    scale_fill_brewer(palette = palname, ...)
}
x = tapply(sigtab$log2FoldChange, sigtab$Order, function(x) max(x))
x = sort(x)
sigtab$Order = factor(as.character(sigtab$Order), levels=names(x))
x = tapply(sigtab$log2FoldChange, sigtab$Family, function(x) max(x))
x = sort(x)
sigtab$Family = factor(as.character(sigtab$Family), levels=names(x))
ggplot(sigtab, aes(x=log2FoldChange, y=Family, color=Order)) + geom_point(aes(size=padj)) + scale_size_continuous(name="padj",range=c(8,4))+
  theme(axis.text.x = element_text(angle = -25, hjust = 0, vjust=0.5))
```

```{r, echo=FALSE, warning=FALSE, out.width = '200%', fig.align= "center"}
#knitr::include_graphics("./figures/diff_analysis_Group3_vs_Group4.png")
```

```{r, echo=FALSE, warning=FALSE}
## The table below shows the raw counts of the 199 OTUs across all samples, with OTUs as rows and samples as columns.
#kable(otu_table(ps.ng.tax)) %>%
#kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

```{r, echo=FALSE, warning=FALSE}
## The table below shows the taxonomic assignments of the 199 OTUs, with OTUs as rows and their corresponding taxonomic ranks as columns.
# ~/Tools/csv2xls-0.4/csv_to_xls.py otu_table.csv tax_table.csv -d',' -o otu_tax.xls;
#kable(taxonomy_df) %>%
#  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

```{r, echo=FALSE, warning=FALSE}
## The sample L9 retained only 413 sequences after the complete preprocessing workflow, which includes filtering, denoising, merging, and chimera removal and was excluded from downstream analyses.
# # Read the TSV file
# ~/Tools/csv2xls-0.4/csv_to_xls.py denoising-stats.csv -d$'\t' -o preprocessing_stats.xls;
# denoising_stats <- read.csv("denoising-stats.csv", sep="\t")
# # Display the table
# kable(denoising_stats, caption = "Preprocessing statistics for each sample") %>%
#   kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

Somatic Variation Detection

Leave a reply

🔬 癌症驱动基因与突变分析流程总览

一、常见癌症的驱动基因（Driver Genes）

癌症类型	常见驱动基因	说明
肺癌（NSCLC）	EGFR, KRAS, ALK, TP53, BRAF	EGFR 和 KRAS 突变常用于靶向治疗判断
肝癌（HCC）	TP53, CTNNB1, AXIN1, TERT	TERT 启动子突变常见
胃癌	TP53, ARID1A, PIK3CA, CDH1	与 WNT/PI3K 通路相关
大肠癌（CRC）	APC, KRAS, TP53, PIK3CA	Wnt 通路异常（APC 突变）
乳腺癌	BRCA1/2, PIK3CA, TP53, ERBB2(HER2)	BRCA 突变相关于遗传性乳癌
黑色素瘤	BRAF, NRAS, NF1	BRAF-V600E 为靶向治疗重要突变
白血病（AML）	FLT3, NPM1, DNMT3A, IDH1/2	多个基因可能共突变

🧪 二、常用的癌症突变检测工具和数据库

🛠️ 工具类（基因组分析）

工具名	用途/特点
Mutect2 (GATK)	检测肿瘤样本中的体细胞突变（与正常对照）
Strelka2	高灵敏度检测 SNV 和 InDel
VarScan2	检测突变和 CNV，适合低深度数据
SomaticSniper	老牌体细胞突变检测工具
FACETS	拷贝数变异与纯度/杂合度估计

🧬 数据库类（已知突变注释）

数据库/平台	功能
COSMIC	全球最大癌症突变数据库
TCGA (via GDC)	大规模癌症全基因组数据平台（含表达、突变等）
cBioPortal	可视化 TCGA、ICGC 数据；浏览癌症基因突变
OncoKB	癌症突变功能注释库，适合靶向药物关联
ClinVar	提供临床意义注释，如是否为致病突变

📘 三、推荐突变分析流程

样本准备 
    ↓
比对 (BWA) 
    ↓
去除重复 (Picard) 
    ↓
突变检测 (Mutect2, Strelka2, VarScan2) 
    ↓
注释 (ANNOVAR / VEP) 
    ↓
功能解释 (OncoKB, COSMIC, cBioPortal)

🧬 四、突变检测与注释：实际操作（从 BAM 到注释 VCF）

✅ 输入需求：

肿瘤 BAM 文件（推荐同时有正常对照 BAM）
参考基因组（例如 hg38.fasta）
索引文件（.bai, .fai）
工具安装：GATK, VEP, 或 ANNOVAR

🛠️ 常见工具推荐

工具	特点	输入要求
Mutect2 (GATK)	最佳的肿瘤–正常突变检测	肿瘤 + 正常 BAM
Strelka2	快速、准确检测 SNV/InDel	肿瘤 + 正常 BAM
VarScan2	可支持 tumor-only 模式	mpileup / VCF
LoFreq	高分辨率 SNV 检测	肿瘤 BAM（可选）

🧪 示例：使用 GATK Mutect2 检测体细胞突变

🔹 Tumor–Normal 模式（推荐）

gatk Mutect2 \
  -R hg38.fasta \
  -I tumor.bam \
  -I normal.bam \
  -tumor TumorSample \
  -normal NormalSample \
  --germline-resource af-only-gnomad.vcf.gz \
  --panel-of-normals pon.vcf.gz \
  -O somatic_raw.vcf

gatk FilterMutectCalls \
  -V somatic_raw.vcf \
  -R hg38.fasta \
  -O somatic_filtered.vcf

🔹 Tumor-Only 模式（无对照）

gatk Mutect2 \
  -R hg38.fasta \
  -I tumor.bam \
  -tumor TumorSample \
  --germline-resource af-only-gnomad.vcf.gz \
  -O somatic_raw.vcf

🧠 突变注释工具

1. Ensembl VEP

vep -i somatic_filtered.vcf \
    -o annotated.vep.vcf \
    --vcf \
    --cache \
    --offline \
    --assembly GRCh38 \
    --dir_cache /path/to/.vep \
    --everything

✅ 适用于通用功能预测、插件丰富（如 COSMIC、gnomAD）

2. ANNOVAR

convert2annovar.pl -format vcf4 somatic_filtered.vcf > input.avinput

table_annovar.pl input.avinput humandb/ \
   -buildver hg38 \
   -out annotated_annovar \
   -remove \
   -protocol refGene,clinvar_20240129,cosmic70,gnomad_genome \
   -operation g,f,f,f \
   -nastring . \
   -vcfinput

✅ 适合癌症、临床相关变异注释（COSMIC、ClinVar、gnomAD）

🚦 工具选择建议

使用场景	推荐工具
全面功能预测	VEP
癌症为重点	ANNOVAR
兼顾两者	同时使用更好

MCV病毒中的LT与sT蛋白功能

Leave a reply

🧬 一、MCV LT蛋白（Large T antigen）

LT蛋白是MCV生命周期中的核心蛋白之一，主要负责启动病毒DNA复制，同时干扰宿主细胞周期。

✅ LT蛋白的主要功能：

1. 启动病毒DNA复制

识别并结合病毒基因组的复制起始位点（origin of replication）
解链DNA起始区域（即 DNA melting）
招募宿主的DNA复制因子（如DNA聚合酶）
实现病毒DNA在宿主细胞内的有效复制

2. 调控病毒转录

控制不同阶段病毒基因的表达，调节病毒生命周期

3. 干扰宿主细胞周期

结合宿主细胞的细胞周期调控蛋白（如pRb）
迫使宿主细胞进入S期，为病毒复制提供有利环境

4. 在致癌中的潜在作用

Merkel细胞癌（MCC）中常见截短型LT蛋白（truncated LT）
虽然失去复制功能，但仍能结合抑癌蛋白（如pRb）
有助于癌变发生

📌 小结

完整LT蛋白：主要用于病毒复制和病毒生命周期调控
截短LT蛋白：不再复制病毒，但可能通过抑癌机制导致癌变

🧬 二、MCV sT蛋白（small T antigen）

sT蛋白是一个多功能调节因子，在病毒致癌潜能中发挥关键作用，主要通过影响细胞信号通路、蛋白降解机制等方式。

✅ sT蛋白的主要功能：

1. 抑制蛋白磷酸酶2A（PP2A）

PP2A 是抑癌因子
sT 抑制 PP2A → 激活 MAPK、AKT/mTOR 等生长信号通路
促进细胞持续增殖，有利于病毒复制与癌变

2. 促进细胞转化与肿瘤形成

在实验模型中，sT可诱导细胞转化（如软琼脂克隆形成）
是Merkel细胞癌的关键致癌因子之一

3. 干扰蛋白降解系统

与 UBE2C（泛素连接酶）等相互作用
干扰蛋白降解和细胞周期调控

4. 稳定并增强LT蛋白表达

抑制LT蛋白降解，延长其在细胞内的寿命
增强病毒复制与致癌潜能

🧩 小结

抑制PP2A：激活生长信号，促进增殖
稳定LT：增强病毒功能
干扰蛋白降解：打破细胞稳态，助推癌变

🧬 三、PP2A 与 UBE2C 简介（人类基因）

1. PP2A（Protein Phosphatase 2A）

属于人类基因，由PPP2CA和PPP2CB编码
是一种关键的抑癌蛋白磷酸酶复合物
主要功能：调控细胞周期、DNA修复、细胞凋亡
被sT蛋白抑制，有利于病毒复制和细胞癌变

2. UBE2C（Ubiquitin-Conjugating Enzyme E2C）

是人类基因，编码泛素结合酶
参与细胞周期蛋白的泛素化降解
推动细胞从有丝分裂中期进入后期
在癌症中常见高表达，MCV sT可能上调其活性

🔁 四、细胞周期与S期简介

细胞周期主要阶段：

G1期：合成RNA与蛋白质，细胞生长
S期：DNA复制阶段，染色体由单体变为姐妹染色单体
G2期：检查DNA复制错误，准备分裂
M期：有丝分裂，生成两个子细胞
G0期：静止期，不再进入增殖周期

🦠 病毒如何利用S期？

MCV等DNA病毒没有自己的复制系统
必须依赖宿主细胞在S期时提供的复制因子（如DNA聚合酶）
MCV LT蛋白通过抑制pRb等抑癌蛋白，解除细胞周期限制
强迫细胞进入S期，为病毒DNA复制创造有利条件

📌 比喻说明：

细胞 = 工厂
S期 = 工厂开足马力复制DNA
病毒 = 插队的外来订单
LT蛋白 = 强迫工厂加班的经理

五、为什么 Large T 被称为“抗原”而不是“蛋白质”

“Large T”通常指的是 大T抗原（Large T antigen），这是源自多瘤病毒（如 SV40）的一种蛋白质。虽然它本质上是蛋白质，但我们在文献和科研中常称之为“抗原”，原因如下：

1. “抗原”是从其生物学功能角度命名的

“抗原”这个词在这里并不是指它在免疫系统中引发免疫反应的功能（尽管它可以），而是历史上首次被识别时，它是在病毒感染的宿主细胞中被抗体识别出来的。
当时科学家通过免疫学方法发现了这种蛋白，因此称它为“抗原”（antigen）。

2. 它的命名源于病毒学历史传统

SV40 病毒的研究中，科学家识别出几种由病毒编码的重要蛋白，如：
- Large T antigen
- Small t antigen
它们以 “T” 命名是因为它们在肿瘤形成（Tumor）中起作用。
“Antigen” 是早期免疫检测常用的术语，沿用至今。

3. 它的功能远不止是“蛋白质”

Large T antigen 是病毒复制和转化细胞（使其癌变）所必需的多功能蛋白。
它能与宿主细胞多种关键蛋白（如 p53、Rb 蛋白）相互作用，调控细胞周期，干扰肿瘤抑制因子。
因此，它常常作为分子标志物或实验工具蛋白在细胞生物学、癌症研究中被广泛使用。

4. 总结

虽然“Large T antigen”在本质上是一个蛋白质，但由于其最初通过免疫方式被发现，并且它在病毒学和细胞生物学中具有重要的功能性和标志性作用，因此沿用了“抗原（antigen）”的名称。这是一种历史命名和功能导向命名的结合。