RNA-seq on sage

TODO on sage

check the alignment of the reads to the annotation which sent from Munich is very bad, using the reference X14112 instead, find the CMV-GFP in the genome. Using alignment to detect the overall alignment rate to X14112 and chrHsv1_s17.
commands on sage
```
#under sage
ln -s /home/jhuang/Tools/nf-core-rnaseq-3.12.0/ rnaseq
[jhuang@sage Data_Caroline_RNAseq_wt_timecourse] nextflow run rnaseq/main.nf --input samplesheet_wt_timecourse.csv --outdir results_GRCh38 --genome GRCh38 -profile test_full  -resume   --max_memory 256.GB --max_time 2400.h        --aligner 'star_salmon' --skip_multiqc

[jhuang@sage Data_Caroline_RNAseq_wt_timecourse] nextflow run rnaseq/main.nf --input samplesheet_wt_timecourse.csv --outdir results_chrHsv1  --fasta chrHsv1_s17.fasta --gtf chrHsv1_s17.gtf  -profile test_full -resume  --max_memory 256.GB --max_time 2400.h     --save_reference    --aligner 'star_salmon'    --gtf_extra_attributes 'gene_id' --gtf_group_features 'transcript_id' --featurecounts_group_type 'gene_id' --featurecounts_feature_type 'transcript'  --skip_rseqc --skip_dupradar --skip_preseq --skip_biotype_qc --skip_deseq2_qc --skip_multiqc

[jhuang@sage Data_Caroline_RNAseq_brain_organoids] nextflow run rnaseq/main.nf --input samplesheet_brain_organoids.12.csv --outdir results_GRCh38 --genome GRCh38 -profile test_full  -resume   --max_memory 256.GB --max_time 2400.h        --aligner 'star_salmon' --skip_multiqc

[jhuang@sage Data_Caroline_RNAseq_brain_organoids] nextflow run rnaseq/main.nf --input samplesheet_brain_organoids.12.csv --outdir results_chrHsv1  --fasta chrHsv1_s17.fasta --gtf chrHsv1_s17.gtf  -profile test_full -resume  --max_memory 256.GB --max_time 2400.h     --save_reference    --aligner 'star_salmon'    --gtf_extra_attributes 'gene_id' --gtf_group_features 'transcript_id' --featurecounts_group_type 'gene_id' --featurecounts_feature_type 'transcript'  --skip_rseqc --skip_dupradar --skip_preseq --skip_biotype_qc --skip_deseq2_qc --skip_multiqc

#Processing *.umi_extract.fastq.gz
(rnaseq) [jhuang@sage Data_Manja_RNAseq_Organoids_Virus]$ nextflow run rnaseq/main.nf --input samplesheet.umi_extract.csv --outdir results_chrHsv1  --fasta chrHsv1_s17.fasta --gtf chrHsv1_s17.gtf  -profile test_full -resume  --max_memory 256.GB --max_time 2400.h     --save_reference    --aligner 'star_salmon'    --gtf_extra_attributes 'gene_id' --gtf_group_features 'transcript_id' --featurecounts_group_type 'gene_id' --featurecounts_feature_type 'transcript'  --skip_rseqc --skip_dupradar --skip_preseq --skip_biotype_qc --skip_deseq2_qc --skip_multiqc

#Processing raw data prepared with umi protocol
(rnaseq) [jhuang@sage Data_Manja_RNAseq_Organoids_Virus]$ nextflow run rnaseq/main.nf --input samplesheet.csv --outdir results_chrHsv1  --fasta chrHsv1_s17.fasta --gtf chrHsv1_s17.gtf  --with_umi --umitools_extract_method "regex" --umitools_bc_pattern "^(?P
```
.{12}).*” –umitools_dedup_stats -profile test_full -resume –max_memory 256.GB –max_time 2400.h –save_reference –aligner ‘star_salmon’ –gtf_extra_attributes ‘gene_id’ –gtf_group_features ‘transcript_id’ –featurecounts_group_type ‘gene_id’ –featurecounts_feature_type ‘transcript’ –skip_rseqc –skip_dupradar –skip_preseq –skip_biotype_qc –skip_deseq2_qc –skip_multiqc –min_mapped_reads 0 #Debug the following error: added “–minAssignedFrags 0 \\” to modules/nf-core/salmon/quant/main.nf option “salmon quant” and added “–min_mapped_reads 0” in the nextflow command above #hits: 0; hits per frag: 0[2023-10-20 11:35:22.944] [jointLog] [warning] salmon was only able to assign 0 fragments to transcripts in the index, but the minimum number of required assigned fragments (–minAssignedFrags) was 1. This could be indicative of a mismatch between the reference and sample, or a very bad sample. You can change the –minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1). (rnaseq) [jhuang@sage Data_Denise_LT_RNAseq]$ nextflow run rnaseq/main.nf –input samplesheet.csv –outdir results_GRCh38 –genome GRCh38 -profile test_full -resume –max_memory 256.GB –max_time 2400.h –save_align_intermeds –save_unaligned –aligner ‘star_salmon’ –skip_multiqc (rnaseq) [jhuang@sage Data_Samira_RNAseq]$ nextflow run rnaseq/main.nf –input samplesheet.csv –outdir results_GRCh38 –genome GRCh38 -profile test_full -resume –max_memory 256.GB –max_time 2400.h –save_align_intermeds –save_unaligned –aligner ‘star_salmon’ (rnaseq) [jhuang@sage Data_Manja_RNAseq_Organoids]$ nextflow run rnaseq/main.nf –input samplesheet.csv –outdir results_GRCh38 –genome GRCh38 –with_umi –umitools_extract_method “regex” –umitools_bc_pattern “^(?P .{12}).*” -profile test_full -resume –max_memory 256.GB –max_time 2400.h –save_align_intermeds –save_unaligned –save_reference –aligner ‘star_salmon’ –pseudo_aligner ‘salmon’ (rnaseq) [jhuang@sage Data_Manja_RNAseq_Organoids_Virus]$ nextflow run rnaseq/main.nf –input samplesheet.csv –outdir results_chrHsv1_s17 –fasta “/home/jhuang/DATA/Data_Manja_RNAseq_Organoids_Virus/chrHsv1_s17.fasta” –gtf “/home/jhuang/DATA/Data_Manja_RNAseq_Organoids_Virus/chrHsv1_s17.gtf” –with_umi –umitools_extract_method “regex” –umitools_bc_pattern “^(?P .{12}).*” –umitools_dedup_stats –skip_rseqc –skip_dupradar –skip_preseq -profile test_full -resume –max_memory 256.GB –max_time 2400.h –save_align_intermeds –save_unaligned –save_reference –aligner ‘star_salmon’ –gtf_extra_attributes ‘gene_id’ –gtf_group_features ‘transcript_id’ –featurecounts_group_type ‘gene_id’ –featurecounts_feature_type ‘transcript’ –skip_multiqc ln -s ~/Tools/rnaseq/assets/multiqc_config.yaml multiqc_config.yaml multiqc -f –config multiqc_config.yaml . 2>&1 rm multiqc_config.yaml

reference on sage

/home/jhuang/REFs/Homo_sapiens/Ensembl/GRCh38
/home/jhuang/REFs/Homo_sapiens/hg38-blacklist.bed

#C3i Science Day – Novel approaches to study the immune-tissue interface

Microbial bioinformatics

Microbial bioinformatics uses computational tools to analyze genomes, track evolution, and study functions in microorganisms, including bacteria and viruses.

Leave a Reply Cancel reply