There are several alternative R packages and tools to perform motif enrichment analysis for RNA-binding proteins (RBPs), beyond PWMEnrich::motifEnrichment(). Here are the most notable ones:
| Tool / Package | Enrichment | Custom Motifs | CLI or R? | RNA-specific? |
| ------------------------ | ----------------- | --------------- | --------- | -------------- |
| **PWMEnrich** | ✅ | ✅ | R | ✅ |
| **RBPmap** | ✅ | ❌ (uses own db) | Web/CLI | ✅ | ----> try RBPmap_results + enrichments!
| **Biostrings/TFBSTools** | ❌ (only scanning) | ✅ | R | ❌ | #ATtRACT + Biostrings / TFBSTools
| **rmap** | ✅ (CLIP-based) | ❌ | R | ✅ |
| **Homer** | ✅ | ✅ | CLI | ⚠ RNA optional |
| **MEME (AME, FIMO)** | ✅ | ✅ | Web/CLI | ⚠ Generic |
-
Get 3UTR.fasta, 5UTR.fasta, CDS.fasta and transcripts.fasta
mRNA Transcript ┌────────────┬────────────┬────────────┐ │ 5′ UTR │ CDS │ 3′ UTR │ └────────────┴────────────┴────────────┘ ↑ ↑ ↑ ↑ Start Start Stop End of Codon Codon of Transcript Transcript ✅ Option 1: Use GENCODE and python scripts (CHOSEN!) ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-up.txt #20086 ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-down.txt #634 ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-up.txt #23832 ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-down.txt #375 #Filtering the down-regulated genes to include only protein_coding genes before extracting 3' UTRs, because #1. Only protein_coding genes have well-annotated 3' UTRs #3' UTRs are defined as the region after the CDS (coding sequence) and before the poly-A tail. #Non-coding RNAs (e.g., lncRNA, snoRNA, miRNA precursors) do not have CDS, and therefore don't have canonical 3' UTRs. #2. In GENCODE, most UTR annotations are only provided for transcripts of gene_type = "protein_coding". grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-up.txt > MKL-1_wt.EV_vs_parental-up_protein_coding.txt grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-down.txt > MKL-1_wt.EV_vs_parental-down_protein_coding.txt grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-up.txt > WaGa_wt.EV_vs_parental-up_protein_coding.txt grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-down.txt > WaGa_wt.EV_vs_parental-down_protein_coding.txt #Visit and Download: GENCODE FTP site https://www.gencodegenes.org/human/ * GTF annotation file (e.g., gencode.v48.annotation.gtf.gz) * Corresponding genome FASTA (e.g., GRCh38.primary_assembly.genome.fa.gz) wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_48/gencode.v48.annotation.gtf.gz wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_48/GRCh38.primary_assembly.genome.fa.gz gunzip gencode.v48.annotation.gtf.gz gunzip GRCh38.primary_assembly.genome.fa.gz python extract_transcript_parts.py MKL-1_wt.EV_vs_parental-down_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa MKL-1_down python extract_transcript_parts.py MKL-1_wt.EV_vs_parental-up_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa MKL-1_up #5988 python extract_transcript_parts.py WaGa_wt.EV_vs_parental-down_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa WaGa_down #93 python extract_transcript_parts.py WaGa_wt.EV_vs_parental-up_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa WaGa_up #6538 ✅ Option 2-5 see at the end!
-
Why 3′ UTR?
🧬 miRNA, RBP, or translation/post-transcriptional regulation ➡️ Use 3' UTR sequences Because: Most miRNA binding and many RBP motifs are located in the 3' UTR. It’s the primary region for mRNA stability, localization, and translation regulation. 🧠 Example: You're looking for binding enrichment of miRNAs or RNA-binding proteins (PUM, HuR, etc.) ✅ Input = 3UTR.fasta 🧪 If you're testing PBRs related to: - Translation initiation, upstream ORFs, or 5' cap interaction: ➡️ Use 5' UTR - Coding mutations, protein-level motifs, or translational efficiency: ➡️ Use CDS - General transcriptome-wide motif search (no preference): ➡️ Use transcripts, or test all regions separately to localize signal
-
Recommended Workflow with RBPmap https://rbpmap.technion.ac.il (Too slow!)
RBPmap itself does not compute enrichment p-values or FDR; it's a motif scanning tool. To get statistically meaningful RBP enrichments, combine RBPmap with custom permutation testing or Fisher’s exact test + multiple testing correction. 1. Prepare foreground (target) and background sequences Extract 3′ UTRs of: 📉 Downregulated mRNAs (foreground) — likely targeted by upregulated miRNAs ⚪ A control set of 3′ UTRs — e.g., non-differentially expressed protein-coding genes grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-all.txt > MKL-1_wt.EV_vs_parental-all_protein_coding.txt grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-all.txt > WaGa_wt.EV_vs_parental-all_protein_coding.txt cut -d',' -f1 MKL-1_wt.EV_vs_parental-all_protein_coding.txt | sort > all_genes.txt #19239 cut -d',' -f1 MKL-1_wt.EV_vs_parental-up_protein_coding.txt | sort > up_genes.txt #5988 cut -d',' -f1 MKL-1_wt.EV_vs_parental-down_protein_coding.txt | sort > down_genes.txt #112 cat up_genes.txt down_genes.txt | sort | uniq > regulated_genes.txt comm -23 all_genes.txt regulated_genes.txt > background_genes.txt grep -Ff background_genes.txt MKL-1_wt.EV_vs_parental-all_protein_coding.txt > MKL-1_wt.EV_vs_parental-background_protein_coding.txt #13139 cut -d',' -f1 WaGa_wt.EV_vs_parental-all_protein_coding.txt | sort > all_genes.txt #19239 cut -d',' -f1 WaGa_wt.EV_vs_parental-up_protein_coding.txt | sort > up_genes.txt #6538 cut -d',' -f1 WaGa_wt.EV_vs_parental-down_protein_coding.txt | sort > down_genes.txt #93 cat up_genes.txt down_genes.txt | sort | uniq > regulated_genes.txt comm -23 all_genes.txt regulated_genes.txt > background_genes.txt grep -Ff background_genes.txt WaGa_wt.EV_vs_parental-all_protein_coding.txt > WaGa_wt.EV_vs_parental-background_protein_coding.txt #12608 python extract_transcript_parts.py MKL-1_wt.EV_vs_parental-background_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa MKL-1_background python extract_transcript_parts.py WaGa_wt.EV_vs_parental-background_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa WaGa_background foreground.fasta: 你的目标(前景)序列,例如下调基因的 3′UTRs。 background.fasta: 你的背景对照序列,例如未显著差异表达的基因的 3′UTRs。 2. Run RBPmap separately on both sets (in total of 6 calculations) * Submit both sets of UTRs to RBPmap. * Use the same settings (e.g., “human genome”, “high stringency”, "Apply conservation filter" etc.) * Choose all RBPs * Download motif match outputs for both sets 3. Count motif hits per RBP in each set You now have: For each RBP: a: number of target 3′ UTRs with a motif match b: number of background UTRs with a motif match c: total number of target UTRs d: total number of background UTRs 4. Perform Fisher’s Exact Test per RBP For each RBP, construct a 2x2 table: Motif Present Motif Absent Foreground (targets) a c - a Background b d - b 5. Adjust p-values for multiple testing Use Benjamini-Hochberg (FDR) correction (e.g., in Python or R) across all RBPs tested. 6.✅ Summary Step Tool Prepare Database of RNA-binding motifs ATtRACT 3′ UTR extraction extract_transcript_parts.py Motif scan RBPmap or FIMO Count motif hits Your own parser (Python or R) Fisher’s exact test scipy.stats or fisher.test() FDR correction multipletests() or p.adjust() python rbp_enrichment.py rbpmap_downregulated.tsv rbpmap_background.tsv rbp_enrichment_results.csv
-
Quick Drop-In Plan (RBPmap Alternative with FIMO for motif scan)
1. [ATtRACT + FIMO (MEME suite)] ATtRACT: Database of RNA-binding motifs. FIMO: Fast and scriptable motif scanning tool. #Download RBP motifs (PWM) from ATtRACT DB; Convert to MEME format (if needed); Use FIMO to scan UTR sequences grep "Homo_sapiens" ATtRACT_db.txt > attract_human.txt #cut -f12 attract_human.txt | sort | uniq > valid_ids.txt python convert_attract_pwm_to_meme.py fimo --thresh 1e-4 --oc fimo_foreground_MKL-1_down attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/MKL-1_down.3UTR.fasta fimo --thresh 1e-4 --oc fimo_foreground_MKL-1_up attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/MKL-1_up.3UTR.fasta fimo --thresh 1e-4 --oc fimo_background_MKL-1_background attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/MKL-1_background.3UTR.fasta fimo --thresh 1e-4 --oc fimo_foreground_WaGa_down attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_down.3UTR.fasta fimo --thresh 1e-4 --oc fimo_foreground_WaGa_up attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_up.3UTR.fasta fimo --thresh 1e-4 --oc fimo_background_WaGa_background attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_background.3UTR.fasta #end #TODO_TOMORROW: mv PBS_analysis RBP_analysis #Test python run_enrichment.py \ --attract ATtRACT_db.txt \ --fimo_fg fimo_foreground_WaGa_down/fimo.tsv \ --fimo_bg fimo_foreground2/fimo.tsv \ --output rbp_enrichment_test.csv python run_enrichment.py \ --attract ATtRACT_db.txt \ --fimo_fg fimo_foreground_MKL-1_up/fimo.tsv \ --fimo_bg fimo_background_MKL-1_background/fimo.tsv \ --output rbp_enrichment_MKL-1_up.csv python run_enrichment.py \ --attract ATtRACT_db.txt \ --fimo_fg fimo_foreground_MKL-1_down/fimo.tsv \ --fimo_bg fimo_background_MKL-1_background/fimo.tsv \ --output rbp_enrichment_MKL-1_down.csv python run_enrichment.py \ --attract ATtRACT_db.txt \ --fimo_fg fimo_foreground_WaGa_up/fimo.tsv \ --fimo_bg fimo_background_WaGa_background/fimo.tsv \ --output rbp_enrichment_WaGa_up.csv python run_enrichment.py \ --attract ATtRACT_db.txt \ --fimo_fg fimo_foreground_WaGa_down/fimo.tsv \ --fimo_bg fimo_background_WaGa_background/fimo.tsv \ --output rbp_enrichment_WaGa_down.csv #工具 功能 关注点 应用场景 FIMO 精确查找 motif 出现位置 motif 在什么位置出现 找出具体结合位点 AME 统计 motif 富集情况 哪些 motif 在某组序列中更富集 比较 motif 是否显著出现更多 如你还在做差异表达后的RBP富集分析,可以考虑先用 FIMO 扫描,再用你自己写的代码 + Fisher’s exact test 做类似 AME 的工作,或直接用 AME 做分析 # Generate the attract_human.meme inkl. Gene_name! #python generate_named_meme.py pwm.txt attract_human.txt python generate_attract_human_meme.py pwm.txt ATtRACT_db.txt #ERROR during running ame --> DEBUG! #--control ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_all.3UTR.fasta \ ame --control --shuffle-- \ --oc ame_out \ --scoring avg \ --method fisher --verbose 5 ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_down.3UTR.fasta attract_human.meme 2. GraphProt2 (ALTERNATIVE_TODO) ML-based tool using sequence + structure Pre-trained models for many RBPs ✅ Advantages: Local, GPU/CPU supported More biologically realistic (includes structure)
-
miRNAs motif analysis using ATtRACT + FIMO
✅ Goal * Extract their sequences * Generate a background set * Run RBP enrichment (e.g., with RBPmap or FIMO) * Get p-adjusted enrichment stats (e.g., Fisher + BH) 5.1 (Optional) Input_1. DE results (differential expression file from smallRNA-seq) Example file: smallRNA_upregulated.txt Format: 1st column = miRNA ID (e.g., hsa-miR-21-5p), optionally with other stats. Input_2. Reference FASTA (Reference sequences from miRBase or GENCODE) From miRBase: mature.fa.gz → contains mature miRNA sequences hairpin.fa.gz → for pre-miRNAs python extract_miRNA_fasta.py smallRNA_upregulated.txt mature.fa up_mature_miRNAs.fa python extract_miRNA_fasta.py smallRNA_downregulated.txt hairpin.fa down_precursor_miRNAs.fa 5.2 (Advanced) Extract Sequences + Background Set Inputs: * up_miRNA.txt and down_miRNA.txt: DE results (first column = miRNA name, e.g., hsa-miR-21-5p) * mature.fa or hairpin.fa from miRBase Outputs: * mirna_up.fa * mirna_down.fa * mirna_background.fa python prepare_miRNA_sets.py up_miRNA.txt down_miRNA.txt mature.fa mirna 🔬 What You Can Do Next Goal Tool Input * RBP motif enrichment in pre-miRNAs RBPmap, FIMO, AME up_precursor_miRNAs.fa * Motif comparison (up vs down miRNAs) DREME, MEME, HOMER Up/down mature miRNAs * Build background for enrichment Random subset of other miRNAs Filtered from hairpin.fa ✅ RBP Enrichment from RBPmap Results 🔹 Use RBPmap output (typically CSV or TSV) 🔹 Compare hit counts in input vs background 🔹 Perform Fisher's exact test + Benjamini-Hochberg correction 🔹 Plot significantly enriched RBPs 📁 Requirements You’ll need: File Description rbpmap_up.tsv RBPmap result file for upregulated set rbpmap_background.tsv RBPmap result file for background set 📝 These should have columns like: Motif Name or Protein Sequence Name or Sequence ID (If different, I’ll show you how to adjust. python analyze_rbpmap_enrichment.py rbpmap_up.tsv rbpmap_background.tsv enriched_up.csv enriched_up_plot.png ✅ Output enriched_up.csv RBP FG_hits BG_hits pval padj enriched ELAVL1 24 2 0.0001 0.003 ✅ HNRNPA1 15 10 0.04 0.06 ❌ enriched_up_plot.png Barplot showing top significant RBPs (lowest FDR) 🧰 Customization Options Would you like: * Support for multiple RBPmap files at once? * To match by RBP family? * A full report (PDF/HTML) of top hits? * Let me know, and I’ll tailor the next script!
-
RBP enrichments via FIMO (The same to the workflow in 4)
1. Collect the 3′ UTR sequences: Use the 3UTR.fasta file generated earlier, filtered to protein-coding and downregulated genes. 2. Prepare Motif Database (MEME format) * ATtRACT: https://attract.cnic.es * RBPDB: http://rbpdb.ccbr.utoronto.ca * Ray2013 (CISBP-RNA motifs) — available via MEME Suite * [RBPmap motifs (if downloadable)] #Example format: rbp_motifs.meme 2. Run FIMO to Scan for RBP Motifs (Similar to RBPmap) fimo --oc fimo_up rbp_motifs.meme mirna_up.fa fimo --oc fimo_down rbp_motifs.meme mirna_down.fa fimo --oc fimo_background rbp_motifs.meme mirna_background.fa #This produces fimo.tsv in each output folder. 3. Run RBP motif enrichment using MEME Suite using AME (Analysis of Motif Enrichment): ame \ --control control_3UTRs.fasta \ --oc ame_out \ --scoring avg \ --method fisher \ 3UTR.fasta \ rbp_motifs.meme Where: * 3UTR.fasta = your downregulated genes’ 3′ UTRs * control_3UTRs.fasta = background UTRs (e.g., random protein-coding genes not downregulated) * rbp_motifs.meme = motif file from RBPDB or Ray2013 4. Interpret Results: Output includes RBP motifs enriched in your downregulated mRNAs' 3′ UTRs. You can then link enriched RBPs to known interactions with your upregulated miRNAs, or explore their regulatory roles. 5. ✅ Bonus: Predict Which mRNAs Are Targets of Your miRNAs Use tools like: miRanda, TargetScan, miRDB Then intersect predicted targets with your downregulated genes to identify likely functional interactions. 6. Summary Goal Input Tool / Approach RBP enrichment 3UTR.fasta of downregulated genes AME with RBP motifs Background/control 3′ UTRs from non-differential or upregulated genes Link miRNA to targets Use TargetScan / miRanda Intersect with down genes 7. Would you like: * Ready-to-use RBP motif .meme file? * Script to generate background sequences? * Visualization options for the enrichment results?
-
Other options to get sequences of 3UTR, 5UTR, CDS and mRNA transcripts
✅ Option 2: Use Ensembl BioMart (web-based, no coding) --> Lasting too long! Go to Ensembl BioMart https://www.ensembl.org/biomart/martview/7b826bcbd0cec79021977f8dc12a8f61 Select: Database: Ensembl Genes Dataset: Homo sapiens genes (GRCh38 or latest) Click on “Filters” → expand Region or Gene to limit your selection (optional). Click on “Attributes”: Under Sequences, check: Sequences 3' UTR sequences Optionally add gene IDs, transcript IDs, etc. Click “Results” to view/download the FASTA of 3' UTRs. ✅ Option 3: Use GENCODE (precompiled annotations) and gffread Use a tool like gffread (from the Cufflinks or gffread package) to extract 3' UTRs: #gffread gencode.v44.annotation.gtf -g GRCh38.primary_assembly.genome.fa -w all_utrs.fa -U #gffread -w three_prime_utrs.fa -g GRCh38.fa -x cds.fa -y proteins.fa -U -F gencode.gtf grep -P "\tthree_prime_utr\t" gencode.v48.annotation.gtf > three_prime_utrs.gtf gtf2bed < three_prime_utrs.gtf > three_prime_utrs.bed bedtools getfasta -fi GRCh38.primary_assembly.genome.fa -bed three_prime_utrs.bed -name -s > three_prime_utrs.fa gffread gencode.v48.annotation.gtf -g GRCh38.primary_assembly.genome.fa -U -w all_with_utrs.fa Add -U flag to extract UTRs, and filter post hoc for only 3' UTRs if needed. ✅ Option 4: Use Bioconductor in R (UCSC-ID, not suitable!) # Install if not already installed if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GenomicFeatures") BiocManager::install("txdbmaker") #sudo apt-get update #sudo apt-get install libmariadb-dev #(optional)sudo apt-get install libmysqlclient-dev install.packages("RMariaDB") # Load library library(GenomicFeatures) # Create TxDb object for human genome txdb <- txdbmaker::makeTxDbFromUCSC(genome="hg38", tablename="refGene") # Extract 3' UTRs by transcript utr3 <- threeUTRsByTranscript(txdb, use.names=TRUE) # View or export as needed ✅ Option 5: Extract 3′ UTRs Using UCSC Table Browser (GUI method) 🔗 Website: UCSC Table Browser 🔹 Step-by-Step Instructions 1. Set the basic parameters: Clade: Mammal Genome: Human Assembly: GRCh38/hg38 Group: Genes and Gene Predictions Track: GENCODE v44 (or latest) Table: knownGene or wgEncodeGencodeBasicV44 Choose knownGene for RefSeq-like models or wgEncodeGencodeBasicV44 for GENCODE 2. Region: Select: genome (default) 3. Output format: Select: sequence 4. Click "get output" 🔹 Sequence Retrieval Options: On the next page (after clicking "get output"), you’ll see sequence options. Configure as follows: ✅ Output format: FASTA ✅ Which part of the gene: Select only → UTRs → 3' UTR only ✅ Header options: choose if you want gene name,
-
⚡️ Bonus: Combine with miRNA-mRNA predictions
Once you have RBPs enriched in downregulated mRNAs, you can intersect: * Which RBPs overlap miRNA binding regions (e.g., via CLIPdb or POSTAR) * Check if miRNAs and RBPs compete or co-bind This can lead to identifying miRNA-RBP regulatory modules.