RNAseq Refinement of Viral Sequences with StringTie

A common tool to utilize RNAseq data to correct or enhance a reference genome, especially for viruses, is StringTie. This tool is primarily designed for transcript assembly and quantification based on the alignments of RNAseq reads.

Here’s a basic pipeline using StringTie with a reference virus sequence:

  1. Mapping/Aligning Reads to the Reference:

    • You’ll first need to align the RNAseq reads to the reference genome. You can use STAR or HISAT2 for this.

      STAR --runThreadN 4 --genomeDir /path/to/genomeDir --readFilesIn /path/to/rnaseq.fastq --outFileNamePrefix /path/to/output_prefix
  2. Transcript Assembly using StringTie:

    • With the alignment file produced (usually in BAM or SAM format), you can use StringTie to assemble the transcripts.

      stringtie -p 4 -o output.gtf -l virus /path/to/aligned_data.bam
  3. Compare and Correct the Reference:

    • If you have a reference annotation, you can compare the assembled transcripts against the annotations to identify novel transcripts or refine existing ones.
    • GFFCompare can be used to compare and evaluate the accuracy of the assembled transcripts against the reference annotation.

      gffcompare -r reference_annotation.gtf -G -o comparison_output output.gtf
  4. Visual Inspection (Optional but Recommended):

    • You can visualize the alignments and the assembled transcripts using a genome browser like IGV (Integrative Genomics Viewer). This can give you insights into regions with alternative splicing, novel exons, or discrepancies between your data and the reference annotation.
  5. Further Analysis:

    • Based on the assembled transcripts, you can proceed to differential expression analysis, identification of novel transcripts, or even SNPs/variants detection if needed.

For the entire process, you’d need:

  • Reference genome (for STAR or HISAT2 alignment).
  • RNAseq FASTQ files.
  • (Optionally) Reference annotation for comparison.

Leave a Reply

Your email address will not be published. Required fields are marked *