manhattan_plot_Carmen_custom_labels_WaGa.R
manhattan_plot_Carmen_custom_labels_MKL-1.R
For example, MKL-1 Cell Line miRNA Analysis Results are as follows.
* Raw count data (d_raw_MKL-1.xlsx): Contains the raw, unnormalized read counts for all miRNAs.
* Mapping heatmap (mapping_heatmap3_MKL-1.pdf)
* Volcano plot (MKL.1_wt_EV_vs_MKL.1_wt_cells.png and .svg)
* PCA plot (pca_MKL-1.png)
* Manhattan plot and data (manhattan_plot_MKL1_vs_EV.png, .svg, and manhattan_plot_MKL1_data.xlsx)
-
Input data
WaGa wt cells (nf774* (Considering to be deleted, due to possibly be an outlier, but in the current version, it is still included in the analysis), nf961, nf962) WaGa wt_EV_RNA (nf657* (The sample was EXCLUDED, since it is obviously a outlier, not clustered with the other 2 samples), nf930, nf935) WaGa_sT_DMSO_EV_RNA (nf931, nf936, nf971) WaGa_sT_Dox_EV_RNA (nf932, nf937, nf972) WaGa_scr_DMSO_EV_RNA (nf933, nf938, nf973) WaGa_scr_Dox_EV_RNA (nf934, nf939, nf974) # --> In total, 17 samples MKL-1 wt cells (nf780*, nf796*, nf797*) MKL-1 wt_EV_RNA (nf655* (The sample was EXCLUDED), 2404, 2608) MKL-1_sT_DMSO_EV_RNA (2608, 2701, 2802) MKL-1_sT_Dox_EV_RNA (2608, 2701, 2802) MKL-1_scr_DMSO_EV_RNA (2608, 2701, 2802) MKL-1_scr_Dox_EV_RNA (2608, 2701, 2802) # --> In total, 18 samples #Note that the real paths are as follows: #./20260506_AV243904_0073_A/2404_MKL1_wt_EVs/2404_MKL1_wt_EVs_R1.fastq.gz, ./20260506_AV243904_0073_A/2608_MKL1_wt_EVs/2608_MKL1_wt_EVs_R1.fastq.gz #./20260506_AV243904_0073_A/2608_MKL1_sT_DMSO/2608_MKL1_sT_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_sT_DMSO/2701_MKL1_sT_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_sT_DMSO/2802_MKL1_sT_DMSO_R1.fastq.gz #./20260506_AV243904_0073_A/2608_MKL1_sT_Dox/2608_MKL1_sT_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_sT_Dox/2701_MKL1_sT_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_sT_Dox/2802_MKL1_sT_Dox_R1.fastq.gz #./20260506_AV243904_0073_A/2608_MKL1_scr_DMSO/2608_MKL1_scr_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_scr_DMSO/2701_MKL1_scr_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_scr_DMSO/2802_MKL1_scr_DMSO_R1.fastq.gz #./20260506_AV243904_0073_A/2608_MKL1_scr_Dox/2608_MKL1_scr_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_scr_Dox/2701_MKL1_scr_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_scr_Dox/2802_MKL1_scr_Dox_R1.fastq.gz -
Adapter trimming
#some common adapter sequences from different kits for reference: # - TruSeq Small RNA (Illumina): TGGAATTCTCGGGTGCCAAGG # - Small RNA Kits V1 (Illumina): TCGTATGCCGTCTTCTGCTTGT # - Small RNA Kits V1.5 (Illumina): ATCTCGTATGCCGTCTTCTGCTTG # - NEXTflex Small RNA Sequencing Kit v3 for Illumina Platforms (Bioo Scientific): TGGAATTCTCGGGTGCCAAGG # - LEXOGEN Small RNA-Seq Library Prep Kit (Illumina): TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC * mkdir Data_Ute_smallRNA_via_exceRpt_workspace/trimmed; cd Data_Ute_smallRNA_via_exceRpt_workspace/trimmed echo "------------------------------------ cutadapting nf774 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf774.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/220617_NB501882_0371_AH7572BGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf774/0403_WaGa_wt_S1_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf657 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf657.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/210817_NB501882_0294_AHW5Y2BGXJ_smallRNA_Ute_newDemulti/2021_nf_ute_smallRNA/nf657/WaGa_derived_EV_miRNA_S2_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf655 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf655.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/210817_NB501882_0294_AHW5Y2BGXJ_smallRNA_Ute_newDemulti/2021_nf_ute_smallRNA/nf655/MKL_1_derived_EV_miRNA_S1_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf780 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf780.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/220617_NB501882_0371_AH7572BGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf780/0505_MKL1_wt_S2_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf796 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf796.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/221216_NB501882_0404_AHLVNMBGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf796/MKL-1_wt_1_S1_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf797 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf797.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/221216_NB501882_0404_AHLVNMBGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf797/MKL-1_wt_2_S2_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf930 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf930.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf930/01_0505_WaGa_wt_EV_RNA_S1_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf931 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf931.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf931/02_0505_WaGa_sT_DMSO_EV_RNA_S2_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf932 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf932.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf932/03_0505_WaGa_sT_Dox_EV_RNA_S3_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf933 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf933.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf933/04_0505_WaGa_scr_DMSO_EV_RNA_S4_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf934 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf934.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf934/05_0505_WaGa_scr_Dox_EV_RNA_S5_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf935 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf935.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf935/06_1905_WaGa_wt_EV_RNA_S6_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf936 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf936.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf936/07_1905_WaGa_sT_DMSO_EV_RNA_S7_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf937 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf937.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf937/08_1905_WaGa_sT_Dox_EV_RNA_S8_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf938 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf938.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf938/09_1905_WaGa_scr_DMSO_EV_RNA_S9_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf939 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf939.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf939/10_1905_WaGa_scr_Dox_EV_RNA_S10_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf940 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf940.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf940/11_control_MKL1_S11_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf941 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf941.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf941/12_control_WaGa_S12_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf961 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf961.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf961/WaGaWTcells_1_S1_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf962 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf962.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf962/WaGaWTcells_2_S2_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf971 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf971.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf971/2001_WaGa_sT_DMSO_S3_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf972 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf972.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf972/2001_WaGa_sT_Dox_S4_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf973 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf973.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf973/2001_WaGa_scr_DMSO_S5_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting nf974 -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf974.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf974/2001_WaGa_scr_Dox_S6_R1_001.fastq.gz >> LOG echo "------------------------------------ cutadapting 2404_MKL1_wt_EVs -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2404_MKL1_wt_EVs.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2404_MKL1_wt_EVs/2404_MKL1_wt_EVs_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2608_MKL1_wt_EVs -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_wt_EVs.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_wt_EVs/2608_MKL1_wt_EVs_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2608_MKL1_sT_DMSO -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_sT_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_sT_DMSO/2608_MKL1_sT_DMSO_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2701_MKL1_sT_DMSO -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_sT_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_sT_DMSO/2701_MKL1_sT_DMSO_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2802_MKL1_sT_DMSO -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_sT_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_sT_DMSO/2802_MKL1_sT_DMSO_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2608_MKL1_sT_Dox -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_sT_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_sT_Dox/2608_MKL1_sT_Dox_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2701_MKL1_sT_Dox -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_sT_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_sT_Dox/2701_MKL1_sT_Dox_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2802_MKL1_sT_Dox -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_sT_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_sT_Dox/2802_MKL1_sT_Dox_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2608_MKL1_scr_DMSO -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_scr_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_scr_DMSO/2608_MKL1_scr_DMSO_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2701_MKL1_scr_DMSO -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_scr_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_scr_DMSO/2701_MKL1_scr_DMSO_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2802_MKL1_scr_DMSO -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_scr_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_scr_DMSO/2802_MKL1_scr_DMSO_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2608_MKL1_scr_Dox -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_scr_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_scr_Dox/2608_MKL1_scr_Dox_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2701_MKL1_scr_Dox -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_scr_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_scr_Dox/2701_MKL1_scr_Dox_R1.fastq.gz >> LOG echo "------------------------------------ cutadapting 2802_MKL1_scr_Dox -----------------------------------" >> LOG cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_scr_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_scr_Dox/2802_MKL1_scr_Dox_R1.fastq.gz >> LOG -
Install exceRpt (https://github.gersteinlab.org/exceRpt/)
docker pull rkitchen/excerpt mkdir MyexceRptDatabase cd /mnt/nvme0n1p1/MyexceRptDatabase wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz tar -xvf exceRptDB_v4_hg38_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg19_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_mm10_lowmem.tgz wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOmiRNArRNA.tgz tar -xvf exceRptDB_v4_EXOmiRNArRNA.tgz wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOGenomes.tgz tar -xvf exceRptDB_v4_EXOGenomes.tgz # List extracted hg38 directory structure find hg38 -type f | sed 's|^hg38/||' | sort > extracted_hg38.txt comm -3 extracted_hg38.txt <(tar -tf exceRptDB_v4_hg38_lowmem.tgz | grep '^hg38/' | sed 's|^hg38/||' | sort) # --> DIR hg38 tar -tf exceRptDB_v4_EXOmiRNArRNA.tgz # --> DIR ribosomeDatabase, NCBI_taxonomy_taxdump, miRBase tar -tf exceRptDB_v4_EXOGenomes.tgz # --> Genomes_BacteriaFungiMammalPlantProtistVirus -
Run exceRpt
#[---- REAL_RUNNING_COMPLETE_DB ---->] #NOTE that if not renamed in the input files, then have to RENAME all files recursively by removing "_cutadapted.fastq" in all names in _CORE_RESULTS_v4.6.3.tgz (first unzip, removing, then zip, mv to ../results_g). cd trimmed for file in *.fastq.gz; do echo "mv \"$file\" \"${file/.fastq/}\"" done mkdir results for sample in nf780 nf796 nf797 nf655 nf774 nf961 nf962 nf657 nf930 nf935 nf931 nf936 nf971 nf932 nf937 nf972 nf933 nf938 nf973 nf934 nf939 nf974; do docker run -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/results:/exceRptOutput \ -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on done for sample in 2404_MKL1_wt_EVs 2608_MKL1_wt_EVs 2608_MKL1_sT_DMSO 2701_MKL1_sT_DMSO 2802_MKL1_sT_DMSO 2608_MKL1_sT_Dox 2701_MKL1_sT_Dox 2802_MKL1_sT_Dox 2608_MKL1_scr_DMSO 2701_MKL1_scr_DMSO 2802_MKL1_scr_DMSO 2608_MKL1_scr_Dox 2701_MKL1_scr_Dox 2802_MKL1_scr_Dox; do docker run -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/trimmed:/exceRptInput \ -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/results:/exceRptOutput \ -v /mnt/nvme3n1p1/MyexceRptDatabase:/exceRpt_DB \ -t rkitchen/excerpt \ INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on done #DEBUG the excerpt env docker inspect rkitchen/excerpt:latest # Without /bin/bash → May run and exit immediately #docker run -it rkitchen/excerpt # With /bin/bash → Stays open for interaction docker run -it --entrypoint /bin/bash rkitchen/excerpt -
Processing exceRpt output from multiple samples
cd ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/exceRpt-master mamba activate r_env mamba install -c conda-forge -c bioconda \ bioconductor-marray \ bioconductor-rgraphviz \ r-plyr r-gplots r-reshape2 r-ggplot2 r-scales r-openxlsx r-rcurl r-xml \ -y mamba install -c conda-forge -c bioconda \ r-plyr r-gplots r-reshape2 r-ggplot2 r-scales r-openxlsx \ bioconductor-marray bioconductor-rgraphviz \ -y #mkdir summaries heatmap_all_WaGa+4_MKL-1 mkdir results_WaGa_EXCLUDED results_MKL-1 summaries_WaGa summaries_MKL-1 heatmap_WaGa heatmap_MKL-1 #! EXCLUDE some isolates since they have total different pattern or due to bad quality --> outliner, until now only one sample, namely nf657 from WaGa wt EV: sudo mv results/nf657* results_WaGa_EXCLUDED/ sudo mv results/nf780* results_MKL-1/ sudo mv results/nf796* results_MKL-1/ sudo mv results/nf797* results_MKL-1/ sudo mv results/nf655* results_MKL-1/ for sample in 2404_MKL1_wt_EVs 2608_MKL1_wt_EVs 2608_MKL1_sT_DMSO 2701_MKL1_sT_DMSO 2802_MKL1_sT_DMSO 2608_MKL1_sT_Dox 2701_MKL1_sT_Dox 2802_MKL1_sT_Dox 2608_MKL1_scr_DMSO 2701_MKL1_scr_DMSO 2802_MKL1_scr_DMSO 2608_MKL1_scr_Dox 2701_MKL1_scr_Dox 2802_MKL1_scr_Dox; do echo "sudo mv results/${sample}* results_MKL-1/" done #Following our initial QC, I noticed that one of the MKL-1 wt-EV samples (nf655) is a clear outlier, clustering far apart from the other two wt-EV replicates in the PCoA plots. I recommend removing nf655 from the downstream MKL-1 analysis, which is similar to our earlier analysis for MKL-1, in which we removed the outlier nf657. Please see the attached figures for reference. mv results_MKL-1/nf655* results_MKL-1_EXCLUDED/ (r_env) jhuang@WS-2290C:~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/exceRpt-master$ R #WARNING: need to reload the R-script after each change of the script. source("mergePipelineRuns_functions.R") processSamplesInDir("../results_WaGa/", "../summaries_WaGa") processSamplesInDir("../results_MKL-1/", "../summaries_MKL-1") #mkdir heatmap_WaGa; cp summaries_WaGa/*.RData heatmap_WaGa; rm heatmap_WaGa/exceRpt_sampleGroupDefinitions.txt; source("mergePipelineRuns_functions_addSampleGroupInfo_WaGa.R") processSamplesInDir("../results_WaGa/", "../heatmap_WaGa") #mkdir heatmap_MKL-1; cp summaries_MKL-1/*.RData heatmap_MKL-1; rm heatmap_MKL-1/exceRpt_sampleGroupDefinitions.txt; source("mergePipelineRuns_functions_addSampleGroupInfo_MKL-1.R") processSamplesInDir("../results_MKL-1/", "../heatmap_MKL-1") #!!!!! IMPORTANT: REPORT heatmap_MKL-1/exceRpt_DiagnosticPlots.pdf and heatmap_MKL-1/mapping_heatmap3.pdf (They are almost the same, mapping_heatmap3.pdf is better due to bigger font size) !!!! #CONSIDERING_TO_DEL_nf774 since it is very far to another two samples (MAYBE BETTER NOT DO THIS, SINCE I HAVE TO GENERATE PCA- and MANHATTAN PLOTS!!): now the sample nf774 was kept in the WaGa results. #~/Tools/csv2xls-0.4/csv_to_xls.py exceRpt_miRNA_ReadsPerMillion.txt exceRpt_tRNA_ReadsPerMillion.txt exceRpt_piRNA_ReadsPerMillion.txt -d$'\t' -o exceRpt_results_detailed.xls # Report summaries_WaGa/exceRpt_mapping_heatmaps_WaGa.xlsx or summaries_MKL-1/exceRpt_mapping_heatmaps_MKL-1.xlsx; # summaries_WaGa/exceRpt_results_detailed_WaGa.xls or summaries_MKL-1/exceRpt_results_detailed_MKL-1.xls; # heatmap_WaGa/mapping_heatmap3_WaGa.pdf or heatmap_MKL-1/mapping_heatmap3_MKL-1.pdf -
Downstream analyis using R for miRNAs (17 WaGa samples)
#Input file #exceRpt_miRNA_ReadCounts.txt #exceRpt_piRNA_ReadCounts.txt ## WaGa experimental groups (scr = scramble control; sT = target knockdown) #WaGa_scr_DMSO_EV (nf933, nf938, nf973) #WaGa_scr_Dox_EV (nf934, nf939, nf974) #WaGa_sT_DMSO_EV (nf931, nf936, nf971) #WaGa_sT_Dox_EV (nf932, nf937, nf972) # ## WaGa wild-type controls #WaGa_wt_cells (nf774, nf961, nf962) #WaGa_wt_EV (nf930, nf935) cd ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/summaries_WaGa mamba activate r_env R #BiocManager::install("AnnotationDbi") #BiocManager::install("clusterProfiler") #BiocManager::install(c("ReactomePA","org.Hs.eg.db")) #BiocManager::install("limma") #BiocManager::install("sva") #install.packages("writexl") #install.packages("openxlsx") library("AnnotationDbi") library("clusterProfiler") library("ReactomePA") library("org.Hs.eg.db") library(DESeq2) library(gplots) library(limma) library(sva) #library(writexl) #d.raw_with_rownames <- cbind(RowNames = rownames(d.raw), d.raw); write_xlsx(d.raw, path = "d_raw.xlsx"); library(openxlsx) d.raw<- read.delim2("exceRpt_miRNA_ReadCounts.txt",sep="\t", header=TRUE, row.names=1) # Desired column order desired_order <- c( "nf933", "nf938", "nf973", "nf934", "nf939", "nf974", "nf931", "nf936", "nf971", "nf932", "nf937", "nf972", "nf774", "nf961", "nf962", "nf930", "nf935" ) # Reorder columns d.raw <- d.raw[, desired_order] setdiff(desired_order, colnames(d.raw)) # Shows missing or misnamed columns #sapply(d.raw, is.numeric) d.raw[] <- lapply(d.raw, as.numeric) #d.raw[] <- lapply(d.raw, function(x) as.numeric(as.character(x))) d.raw <- round(d.raw) write.csv(d.raw, file ="d_raw.csv") write.xlsx(d.raw, file = "d_raw.xlsx", rowNames = TRUE) # ------ Code sent to Ute ------ #d.raw <- read.delim2("d_raw.csv",sep=",", header=TRUE, row.names=1) Cell_or_EV = as.factor(c("EV","EV","EV", "EV","EV","EV", "EV","EV","EV", "EV","EV","EV", "Cell","Cell","Cell", "EV","EV")) replicates = as.factor(c("WaGa_scr_DMSO_EV","WaGa_scr_DMSO_EV","WaGa_scr_DMSO_EV", "WaGa_scr_Dox_EV","WaGa_scr_Dox_EV","WaGa_scr_Dox_EV", "WaGa_sT_DMSO_EV","WaGa_sT_DMSO_EV","WaGa_sT_DMSO_EV", "WaGa_sT_Dox_EV","WaGa_sT_Dox_EV","WaGa_sT_Dox_EV", "WaGa_wt_cells", "WaGa_wt_cells","WaGa_wt_cells", "WaGa_wt_EV", "WaGa_wt_EV")) ids = as.factor(c( "nf933", "nf938", "nf973", "nf934", "nf939", "nf974", "nf931", "nf936", "nf971", "nf932", "nf937", "nf972", "nf774", "nf961", "nf962", "nf930", "nf935")) cData = data.frame(row.names=colnames(d.raw), replicates=replicates, ids=ids, Cell_or_EV=Cell_or_EV) dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates) # Filter low-count miRNAs dds <- dds[ rowSums(counts(dds)) > 10, ] rld <- rlogTransformation(dds) # -- before pca -- png("pca.png", 1200, 800) plotPCA(rld, intgroup=c("replicates")) #plotPCA(rld, intgroup = c("replicates", "batch")) #plotPCA(rld, intgroup = c("replicates", "ids")) #plotPCA(rld, "batch") dev.off() png("pca2.png", 1200, 800) #plotPCA(rld, intgroup=c("replicates")) #plotPCA(rld, intgroup = c("replicates", "batch")) plotPCA(rld, intgroup = c("replicates", "ids")) #plotPCA(rld, "batch") dev.off() # Batch Effect Removal Methods (Non-batch effect removal applied!) #### STEP2: DEGs #### #- Heatmap untreated/wt vs parental; 1x for WaGa cell line #- Volcano plot untreated/wt vs parental; 1x for WaGa cell line #- Manhattan plot miRNAs; 1x for WaGa cell line #- Distribution of different small RNA species untreated/wt and parental; 1x for WaGa cell line #- Motif analysis: identify RNA-binding proteins that may regulate small RNA loading; 1x for WaGa cell line #convert bam to bigwig using deepTools by feeding inverse of DESeq’s size Factor sizeFactors(dds) #NULL dds <- estimateSizeFactors(dds) sizeFactors(dds) normalized_counts <- counts(dds, normalized=TRUE) write.table(normalized_counts, file="normalized_counts.txt", sep="\t", quote=F, col.names=NA) write.xlsx(normalized_counts, file = "normalized_counts.xlsx", rowNames = TRUE) dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates) dds$replicates <- relevel(dds$replicates, "WaGa_wt_cells") dds = DESeq(dds, betaPrior=FALSE) #default betaPrior is FALSE resultsNames(dds) clist <- c("WaGa_wt_EV_vs_WaGa_wt_cells") #NOTE that the results sent to Ute is |padj|<=0.1. for (i in clist) { contrast = paste("replicates", i, sep="_") res = results(dds, name=contrast) res <- res[!is.na(res$log2FoldChange),] #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na res$padj <- ifelse(is.na(res$padj), 1, res$padj) res_df <- as.data.frame(res) write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-")) up <- subset(res_df, padj<=0.05 & log2FoldChange>=2) down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2) write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-")) write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-")) } ~/Tools/csv2xls-0.4/csv_to_xls.py \ WaGa_wt_EV_vs_WaGa_wt_cells-all.txt \ WaGa_wt_EV_vs_WaGa_wt_cells-up.txt \ WaGa_wt_EV_vs_WaGa_wt_cells-down.txt \ -d$',' -o WaGa_wt_EV_vs_WaGa_wt_cells.xls; # ------------------- volcano_plot ------------------- library(ggplot2) library(ggrepel) geness_res <- read.csv(file = paste("WaGa_wt_EV_vs_WaGa_wt_cells", "all.txt", sep="-"), row.names=1) external_gene_name <- rownames(geness_res) geness_res <- cbind(geness_res, external_gene_name) #top_g are from ids top_g <- c("hsa-let-7b-5p","hsa-let-7g-5p","hsa-let-7i-5p","hsa-miR-103a-3p","hsa-miR-107","hsa-miR-1224-5p","hsa-miR-122-5p","hsa-miR-1226-5p","hsa-miR-1246","hsa-miR-127-3p","hsa-miR-1290","hsa-miR-130a-3p","hsa-miR-139-3p","hsa-miR-141-3p","hsa-miR-143-3p","hsa-miR-148b-3p","hsa-miR-155-5p","hsa-miR-15a-5p","hsa-miR-17-5p","hsa-miR-184","hsa-miR-18a-3p","hsa-miR-18a-5p","hsa-miR-190a-5p","hsa-miR-191-5p","hsa-miR-193b-5p","hsa-miR-197-5p","hsa-miR-200a-3p","hsa-miR-200b-5p","hsa-miR-206","hsa-miR-20a-5p","hsa-miR-210-3p","hsa-miR-2110","hsa-miR-21-5p","hsa-miR-218-5p","hsa-miR-219a-1-3p","hsa-miR-221-3p","hsa-miR-23b-3p","hsa-miR-27a-3p","hsa-miR-27b-3p","hsa-miR-27b-5p","hsa-miR-28-3p","hsa-miR-30a-5p","hsa-miR-30c-5p","hsa-miR-30e-5p","hsa-miR-3127-5p","hsa-miR-3131","hsa-miR-3180|hsa-miR-3180-3p","hsa-miR-320a","hsa-miR-320b","hsa-miR-320c","hsa-miR-320d","hsa-miR-330-3p","hsa-miR-335-3p","hsa-miR-33b-5p","hsa-miR-340-5p","hsa-miR-342-5p","hsa-miR-3605-5p","hsa-miR-361-3p","hsa-miR-365a-5p","hsa-miR-374b-5p","hsa-miR-378i","hsa-miR-379-5p","hsa-miR-3940-5p","hsa-miR-409-3p","hsa-miR-411-5p","hsa-miR-423-3p","hsa-miR-423-5p","hsa-miR-4286","hsa-miR-429","hsa-miR-432-5p","hsa-miR-4326","hsa-miR-451a","hsa-miR-4520-3p","hsa-miR-454-3p","hsa-miR-4646-5p","hsa-miR-4667-5p","hsa-miR-4748","hsa-miR-483-5p","hsa-miR-486-5p","hsa-miR-5010-5p","hsa-miR-504-3p","hsa-miR-5187-5p","hsa-miR-590-3p","hsa-miR-6128","hsa-miR-625-5p","hsa-miR-6726-5p","hsa-miR-6730-5p","hsa-miR-676-3p","hsa-miR-6767-5p","hsa-miR-6777-5p","hsa-miR-6780a-5p","hsa-miR-6794-5p","hsa-miR-6817-3p","hsa-miR-708-5p","hsa-miR-7-5p","hsa-miR-766-5p","hsa-miR-7854-3p","hsa-miR-873-3p","hsa-miR-885-3p","hsa-miR-92b-5p","hsa-miR-93-5p","hsa-miR-937-3p","hsa-miR-9-5p","hsa-miR-98-5p") subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)) geness_res$Color <- "NS or log2FC < 2.0" geness_res$Color[geness_res$pvalue < 0.05] <- "P < 0.05" geness_res$Color[geness_res$padj < 0.05] <- "P-adj < 0.05" geness_res$Color[abs(geness_res$log2FoldChange) < 2.0] <- "NS or log2FC < 2.0" write.csv(geness_res, "WaGa_wt_EV_vs_WaGa_wt_cells_with_Category.csv") geness_res$invert_P <- (-log10(geness_res$pvalue)) * sign(geness_res$log2FoldChange) geness_res <- geness_res[, -1*ncol(geness_res)] png("WaGa_wt_EV_vs_WaGa_wt_cells.png",width=1200, height=1400) #svg("WaGa_wt_EV_vs_WaGa_wt_cells.svg",width=12, height=14) ggplot(geness_res, aes(x = log2FoldChange, y = -log10(pvalue), color = Color, label = external_gene_name)) + geom_vline(xintercept = c(2.0, -2.0), lty = "dashed") + geom_hline(yintercept = -log10(0.05), lty = "dashed") + geom_point() + labs(x = "log2(FC)", y = "Significance, -log10(P)", color = "Significance") + scale_color_manual(values = c("P < 0.05"="orange","P-adj < 0.05"="red","NS or log2FC < 2.0"="darkgray"),guide = guide_legend(override.aes = list(size = 4))) + scale_y_continuous(expand = expansion(mult = c(0,0.05))) + geom_text_repel(data = subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)), size = 4, point.padding = 0.15, color = "black", min.segment.length = .1, box.padding = .2, lwd = 2) + theme_bw(base_size = 16) + theme(legend.position = "bottom") dev.off() # ---------------------------------------- # ----------- manhattan_plot ------------- Rscript manhattan_plot_Carmen_custom_labels.R #exceRpt_miRNA_ReadCounts.txt -
Downstream analyis using R for miRNAs (17 MKL-1 samples)
#Input file #exceRpt_miRNA_ReadCounts.txt #exceRpt_piRNA_ReadCounts.txt #MKL-1_sT_DMSO_EV ("X2608_MKL1_sT_DMSO","X2701_MKL1_sT_DMSO","X2802_MKL1_sT_DMSO") #MKL-1_sT_Dox_EV ("X2608_MKL1_sT_Dox","X2701_MKL1_sT_Dox","X2802_MKL1_sT_Dox") #MKL-1_scr_DMSO_EV ("X2608_MKL1_scr_DMSO","X2701_MKL1_scr_DMSO","X2802_MKL1_scr_DMSO") #MKL-1_scr_Dox_EV ()"X2608_MKL1_scr_Dox","X2701_MKL1_scr_Dox","X2802_MKL1_scr_Dox") #MKL-1_wt_cells ("nf780","nf796","nf797") #MKL-1_wt_EV ("X2404_MKL1_wt_EVs","X2608_MKL1_wt_EVs") cd ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/summaries_MKL-1 mamba activate r_env R #BiocManager::install("AnnotationDbi") #BiocManager::install("clusterProfiler") #BiocManager::install(c("ReactomePA","org.Hs.eg.db")) #BiocManager::install("limma") #BiocManager::install("sva") #install.packages("writexl") #install.packages("openxlsx") library("AnnotationDbi") library("clusterProfiler") library("ReactomePA") library("org.Hs.eg.db") library(DESeq2) library(gplots) library(limma) library(sva) #library(writexl) #d.raw_with_rownames <- cbind(RowNames = rownames(d.raw), d.raw); write_xlsx(d.raw, path = "d_raw.xlsx"); library(openxlsx) d.raw<- read.delim2("exceRpt_miRNA_ReadCounts.txt",sep="\t", header=TRUE, row.names=1) # Desired column order desired_order <- c( "X2608_MKL1_sT_DMSO","X2701_MKL1_sT_DMSO","X2802_MKL1_sT_DMSO", "X2608_MKL1_sT_Dox","X2701_MKL1_sT_Dox","X2802_MKL1_sT_Dox", "X2608_MKL1_scr_DMSO","X2701_MKL1_scr_DMSO","X2802_MKL1_scr_DMSO", "X2608_MKL1_scr_Dox","X2701_MKL1_scr_Dox","X2802_MKL1_scr_Dox", "nf780","nf796","nf797", "X2404_MKL1_wt_EVs","X2608_MKL1_wt_EVs" ) # Reorder columns d.raw <- d.raw[, desired_order] setdiff(desired_order, colnames(d.raw)) # Shows missing or misnamed columns #sapply(d.raw, is.numeric) d.raw[] <- lapply(d.raw, as.numeric) #d.raw[] <- lapply(d.raw, function(x) as.numeric(as.character(x))) d.raw <- round(d.raw) write.csv(d.raw, file ="d_raw.csv") write.xlsx(d.raw, file = "d_raw.xlsx", rowNames = TRUE) #d.raw <- read.delim2("d_raw.csv",sep=",", header=TRUE, row.names=1) Cell_or_EV = as.factor(c("EV","EV","EV", "EV","EV","EV", "EV","EV","EV", "EV","EV","EV", "Cell","Cell","Cell", "EV","EV")) replicates = as.factor(c("MKL-1_sT_DMSO_EV","MKL-1_sT_DMSO_EV","MKL-1_sT_DMSO_EV", "MKL-1_sT_Dox_EV","MKL-1_sT_Dox_EV","MKL-1_sT_Dox_EV", "MKL-1_scr_DMSO_EV","MKL-1_scr_DMSO_EV","MKL-1_scr_DMSO_EV", "MKL-1_scr_Dox_EV","MKL-1_scr_Dox_EV","MKL-1_scr_Dox_EV", "MKL-1_wt_cells", "MKL-1_wt_cells","MKL-1_wt_cells", "MKL-1_wt_EV","MKL-1_wt_EV")) ids = as.factor(c("X2608_MKL1_sT_DMSO","X2701_MKL1_sT_DMSO","X2802_MKL1_sT_DMSO", "X2608_MKL1_sT_Dox","X2701_MKL1_sT_Dox","X2802_MKL1_sT_Dox", "X2608_MKL1_scr_DMSO","X2701_MKL1_scr_DMSO","X2802_MKL1_scr_DMSO", "X2608_MKL1_scr_Dox","X2701_MKL1_scr_Dox","X2802_MKL1_scr_Dox", "nf780","nf796","nf797", "X2404_MKL1_wt_EVs","X2608_MKL1_wt_EVs")) cData = data.frame(row.names=colnames(d.raw), replicates=replicates, ids=ids, Cell_or_EV=Cell_or_EV) dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates) # Filter low-count miRNAs dds <- dds[ rowSums(counts(dds)) > 10, ] rld <- rlogTransformation(dds) # -- before pca -- png("pca.png", 1200, 800) plotPCA(rld, intgroup=c("replicates")) #plotPCA(rld, intgroup = c("replicates", "batch")) #plotPCA(rld, intgroup = c("replicates", "ids")) #plotPCA(rld, "batch") dev.off() png("pca2.png", 1200, 800) #plotPCA(rld, intgroup=c("replicates")) #plotPCA(rld, intgroup = c("replicates", "batch")) plotPCA(rld, intgroup = c("replicates", "ids")) #plotPCA(rld, "batch") dev.off() # Batch Effect Removal Methods (Non-batch effect removal applied!) #### STEP2: DEGs #### #- Heatmap untreated/wt vs parental; 1x for WaGa cell line #- Volcano plot untreated/wt vs parental; 1x for WaGa cell line #- Manhattan plot miRNAs; 1x for WaGa cell line #- Distribution of different small RNA species untreated/wt and parental; 1x for WaGa cell line #- Motif analysis: identify RNA-binding proteins that may regulate small RNA loading; 1x for WaGa cell line #convert bam to bigwig using deepTools by feeding inverse of DESeq’s size Factor sizeFactors(dds) #NULL dds <- estimateSizeFactors(dds) sizeFactors(dds) normalized_counts <- counts(dds, normalized=TRUE) write.table(normalized_counts, file="normalized_counts.txt", sep="\t", quote=F, col.names=NA) write.xlsx(normalized_counts, file = "normalized_counts.xlsx", rowNames = TRUE) dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates) dds$replicates <- relevel(dds$replicates, "MKL-1_wt_cells") dds = DESeq(dds, betaPrior=FALSE) #default betaPrior is FALSE resultsNames(dds) clist <- c("MKL.1_wt_EV_vs_MKL.1_wt_cells") #NOTE that the results sent to Ute is |padj|<=0.1. for (i in clist) { contrast = paste("replicates", i, sep="_") res = results(dds, name=contrast) res <- res[!is.na(res$log2FoldChange),] #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na res$padj <- ifelse(is.na(res$padj), 1, res$padj) res_df <- as.data.frame(res) write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-")) up <- subset(res_df, padj<=0.05 & log2FoldChange>=2) down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2) write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-")) write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-")) } ~/Tools/csv2xls-0.4/csv_to_xls.py \ MKL.1_wt_EV_vs_MKL.1_wt_cells-all.txt \ MKL.1_wt_EV_vs_MKL.1_wt_cells-up.txt \ MKL.1_wt_EV_vs_MKL.1_wt_cells-down.txt \ -d$',' -o MKL.1_wt_EV_vs_MKL.1_wt_cells.xls; # ------------------- volcano_plot ------------------- library(ggplot2) library(ggrepel) geness_res <- read.csv(file = paste("MKL.1_wt_EV_vs_MKL.1_wt_cells", "all.txt", sep="-"), row.names=1) external_gene_name <- rownames(geness_res) geness_res <- cbind(geness_res, external_gene_name) #top_g are from ids top_g <- c("hsa-miR-203a-3p","hsa-miR-6850-5p","hsa-miR-4511","hsa-miR-5187-5p","hsa-miR-133b","hsa-miR-1246","hsa-miR-625-3p","hsa-miR-6741-3p","hsa-miR-192-5p","hsa-miR-10b-5p","hsa-miR-885-5p","hsa-miR-30e-3p","hsa-miR-101-3p","hsa-miR-1307-5p","hsa-miR-95-3p","hsa-miR-889-3p","hsa-miR-206","hsa-miR-301a-3p","hsa-miR-1-3p","hsa-let-7c-5p","hsa-miR-196a-5p","hsa-let-7f-5p","hsa-let-7e-5p","hsa-miR-30c-5p","hsa-miR-30a-3p","hsa-miR-146b-5p","hsa-miR-25-3p","hsa-miR-182-5p","hsa-miR-98-5p","hsa-let-7a-5p","hsa-miR-149-5p","hsa-miR-148a-3p","hsa-miR-873-3p","hsa-miR-19b-3p","hsa-miR-320c","hsa-miR-375","hsa-miR-30a-5p","hsa-miR-877-5p","hsa-miR-34a-5p","hsa-miR-324-5p","hsa-miR-652-3p","hsa-miR-342-5p","hsa-miR-7706","hsa-miR-361-3p","hsa-miR-361-5p","hsa-miR-1180-3p","hsa-miR-217","hsa-miR-1307-3p","hsa-miR-1908-5p","hsa-miR-15b-5p","hsa-miR-92b-5p","hsa-miR-484","hsa-miR-197-3p","hsa-miR-200c-3p","hsa-miR-671-5p","hsa-miR-339-5p","hsa-miR-1301-3p","hsa-miR-769-5p","hsa-miR-328-3p","hsa-miR-93-5p","hsa-miR-103a-3p") subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)) geness_res$Color <- "NS or log2FC < 2.0" geness_res$Color[geness_res$pvalue < 0.05] <- "P < 0.05" geness_res$Color[geness_res$padj < 0.05] <- "P-adj < 0.05" geness_res$Color[abs(geness_res$log2FoldChange) < 2.0] <- "NS or log2FC < 2.0" write.csv(geness_res, "MKL.1_wt_EV_vs_MKL.1_wt_cells_with_Category.csv") geness_res$invert_P <- (-log10(geness_res$pvalue)) * sign(geness_res$log2FoldChange) geness_res <- geness_res[, -1*ncol(geness_res)] png("MKL.1_wt_EV_vs_MKL.1_wt_cells.png",width=1200, height=1400) #svg("MKL.1_wt_EV_vs_MKL.1_wt_cells.svg",width=12, height=14) ggplot(geness_res, aes(x = log2FoldChange, y = -log10(pvalue), color = Color, label = external_gene_name)) + geom_vline(xintercept = c(2.0, -2.0), lty = "dashed") + geom_hline(yintercept = -log10(0.05), lty = "dashed") + geom_point() + labs(x = "log2(FC)", y = "Significance, -log10(P)", color = "Significance") + scale_color_manual(values = c("P < 0.05"="orange","P-adj < 0.05"="red","NS or log2FC < 2.0"="darkgray"),guide = guide_legend(override.aes = list(size = 4))) + scale_y_continuous(expand = expansion(mult = c(0,0.05))) + geom_text_repel(data = subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)), size = 4, point.padding = 0.15, color = "black", min.segment.length = .1, box.padding = .2, lwd = 2) + theme_bw(base_size = 16) + theme(legend.position = "bottom") dev.off() # ---------------------------------------- # ----------- manhattan_plot ------------- Rscript manhattan_plot_Carmen_custom_labels.R #exceRpt_miRNA_ReadCounts.txt