Normalization of RNA-seq and ChIP-seq data

gene_x 0 like s 611 view s

Tags: RNA-seq, ChIP-seq

Normalization methods for RNA-seq data

  1. DESeq (RLE - Relative Log Expression):

    • Goal: To normalize for differences in library size and distribution of read counts.
    • Method: Uses a median ratio method, where for each gene, the ratio of its read count to the geometric mean of read counts across all samples is calculated. The median of these ratios is used as a size factor for normalization.
    • Example: Suppose we have two samples A and B with raw read counts for three genes X, Y, Z as follows: A: X=10, Y=20, Z=40 B: X=20, Y=40, Z=80 The size factor for A would be 1, and for B would be 2 (since all counts in B are double of A). After normalization, the counts in B would be divided by 2, making them equal to A.
  2. TMM (Trimmed Mean of M-values):

    • Goal: To normalize for differences in library size and distribution, while being less sensitive to extreme values.
    • Method: Calculates a weighted trimmed mean of the log expression ratios, after excluding the most extreme values.
    • Example: Continuing with the previous example, TMM would also recognize that B has double the counts of A. After normalization, the counts in B would be divided by 2.
  3. Upper Quantile:

    • Goal: To adjust for differences in library size based on the upper quantile of counts.
    • Method: Scales the read counts so that the upper quantiles of the counts are the same across samples.
    • Example: In the previous example, the 75% quantile for both A and B is the same, so no further normalization would be required.
  4. Total Counts:

    • Goal: To normalize based on total counts across samples.
    • Method: Divides each read count by the total number of reads in the sample.
    • Example: If A had 100 total reads and B had 200, the counts in B would be divided by 2 for normalization.
  5. RPKM (Reads Per Kilobase of transcript per Million mapped reads):

    • Goal: To normalize for gene length and total read count.
    • Method: For each gene, divide the read count by the gene length (in kilobases) and then by the total number of reads (in millions).
    • Example: If gene X is 2kb long, in sample A (with 100 total reads), RPKM for X = (10 / 2) / 0.1 = 50.

Normalization methods for ChIP-seq Data

For ChIP-seq data, it is crucial to also consider other factors like input control normalization, and peak calling. Some of the complete methods/tools that incorporate normalization as part of the ChIP-seq analysis pipeline include:

  1. MACS (Model-based Analysis of ChIP-Seq):

    • Description: A widely used tool for identifying transcription factor binding sites and regions of histone modification.
    • Features: Provides robust peak calling with a focus on identifying precise locations of binding sites.
    • Normalization Method: It includes a local background correction to account for bias due to local chromatin structure or GC content.
  2. SICER (Spatial clustering approach for the Identification of ChIP-Enriched Regions):

    • Description: A tool designed to identify broad regions of enrichment that are typically associated with histone modifications.
    • Features: Particularly useful for datasets where the regions of enrichment are distributed in broader domains rather than sharp peaks.
    • Normalization Method: Uses a spatial clustering approach to differentiate between true signals and background noise, accounting for both local and global variations.
  3. ChIPQC:

    • Description: A Bioconductor package providing quality control and normalization functionalities for ChIP-seq data.
    • Features: Offers comprehensive analysis of ChIP-seq quality, including diagnostic plots and summary statistics.
    • Normalization Method: Provides tools for normalization, though it primarily focuses on quality control aspects. Users may integrate ChIPQC with other packages for more advanced normalization procedures.

There are several standalone methods and packages specifically designed for normalization of ChIP-seq data, considering the unique characteristics of these experiments. Below are some of the popular ones:

  1. csaw:

    • Description: A Bioconductor package that provides functions for normalization and differential binding analysis in ChIP-seq data.
    • Features: It is designed for analyzing broad genomic regions and is effective even in the presence of strong sample-to-sample variability.
    • Normalization Method: It uses a sliding window approach and models the count data using a negative binomial distribution.
  2. DiffBind:

    • Description: Another Bioconductor package that performs differential binding analysis on ChIP-seq data.
    • Features: It provides extensive functionalities for quality control, normalization, and downstream analysis.
    • Normalization Method: It supports several normalization methods including total count scaling, RPKM, and DESeq normalization.
  3. ChIPnorm:

    • Description: A standalone R package for normalization of ChIP-seq data.
    • Features: It is specifically designed for normalization of ChIP-seq data against input controls.
    • Normalization Method: It uses quantile normalization to correct for the distribution of read counts.
  4. deepTools:

    • Description: A suite of python tools particularly used for quality control and normalization of deep-sequencing data.
    • Features: It includes a wide variety of tools for assessing correlation between samples, visualizing data, and normalization.
    • Normalization Method: It supports normalization methods such as reads per kilobase per million (RPKM), log1p, and z-score normalization.
  5. DANPOS:

    • Description: A package for dynamic analysis of nucleosome and protein-DNA binding with high resolution.
    • Features: It is designed for analyzing positional patterns of regulatory elements in ChIP-seq data.
    • Normalization Method: It includes methods for normalizing sequencing depth and background noise.
  6. SPP:

    • Description: An R package for analyzing ChIP-seq data with a focus on identifying quality metrics.
    • Features: It includes functionalities for creating quality control plots, assessing cross-correlation, and normalizing read counts.
    • Normalization Method: It provides cross-correlation based normalization.
  7. SeqNorm:

    • Description: A standalone tool for normalizing ChIP-seq data.
    • Features: It is designed to normalize ChIP-seq datasets to a common reference, improving comparability.
    • Normalization Method: It uses a scaling factor based on non-enriched regions.

These tools offer a variety of normalization methods tailored for the specific challenges posed by ChIP-seq data. Users can choose the one that best fits their experimental design and analysis needs.

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum