- Merge re-sequenced FastQ files (cat)
- Sub-sample FastQ files and auto-infer strandedness (fq, Salmon)
- Read QC (FastQC)
- UMI extraction (UMI-tools)
- Adapter and quality trimming (Trim Galore!)
- Removal of genome contaminants (BBSplit)
- Removal of ribosomal RNA (SortMeRNA)
- Choice of multiple alignment and quantification routes: STAR -> Salmon STAR -> RSEM HiSAT2 -> NO QUANTIFICATION
- Sort and index alignments (SAMtools)
- UMI-based deduplication (UMI-tools)
- Duplicate read marking (picard MarkDuplicates)
- Transcript assembly and quantification (StringTie)
- Create bigWig coverage files (BEDTools, bedGraphToBigWig)
- Extensive quality control: RSeQC Qualimap dupRadar Preseq DESeq2
- Pseudo-alignment and quantification (Salmon; optional)
- Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)
umi-tools extract:
Flexible removal of UMI sequences from fastq reads.
UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read. Can also filter reads by quality or against a whitelist (see above)
The remaining commands, group, dedup and count/count_tab, are used to identify PCR duplicates using the UMIs and perform different levels of analysis depending on the needs of the user. A number of different UMI deduplication schemes are enabled - The recommended method is directional.
umi-tools dedup:
Groups PCR duplicates and deduplicates reads to yield one read per group
Use this when you want to remove the PCR duplicates prior to any downstream analysis
Introducing BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries
Removal of genome contaminants (BBSplit)
Removal of ribosomal RNA
StringTie for Transcript assembly and quantification
Extensive quality control:
The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library,
Herpes-virus is double-stranded RNA.
Herpes-simplex-Viren (humane Herpesviren Typ 1 und 2) sind häufige Ursachen rezidivierender Infektionen mit Beteiligung von Haut, Mund, Lippen, Augen und …
Salmon used expectation–maximization (EM) algorithm to assign reads if two copy of genes occurs.