gene_x 0 like s 208 view s
Tags: pipeline
https://www.biobam.com/functional-analysis/
Assign KEGG and GO Terms (see diagram above)
Since your organism is non-model, standard R databases (org.Hs.eg.db, etc.) won’t work. You’ll need to manually retrieve KEGG and GO annotations.
Option 1 (KEGG Terms): EggNog based on orthology and phylogenies
EggNOG-mapper assigns both KEGG Orthology (KO) IDs and GO terms.
Install EggNOG-mapper:
mamba create -n eggnog_env python=3.8 eggnog-mapper -c conda-forge -c bioconda #eggnog-mapper_2.1.12
mamba activate eggnog_env
Run annotation:
#diamond makedb --in eggnog6.prots.faa -d eggnog_proteins.dmnd
mkdir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
download_eggnog_data.py --dbname eggnog.db -y --data_dir /home/jhuang/mambaforge/envs/eggnog_env/lib/python3.8/site-packages/data/
#NOT_WORKING: emapper.py -i CP059040_gene.fasta -o eggnog_dmnd_out --cpu 60 -m diamond[hmmer,mmseqs] --dmnd_db /home/jhuang/REFs/eggnog_data/data/eggnog_proteins.dmnd
python ~/Scripts/update_fasta_header.py CP059040_protein_.fasta CP059040_protein.fasta
emapper.py -i CP059040_protein.fasta -o eggnog_out --cpu 60 --resume
#----> result annotations.tsv: Contains KEGG, GO, and other functional annotations.
#----> 470.IX87_14445:
* 470 likely refers to the organism or strain (e.g., Acinetobacter baumannii ATCC 19606 or another related strain).
* IX87_14445 would refer to a specific gene or protein within that genome.
Extract KEGG KO IDs from annotations.emapper.annotations.
Option 2 (GO Terms from 'Blast2GO 5 Basic', saved in blast2go_annot.annot): Using Blast/Diamond + Blast2GO_GUI based on sequence alignment + GO mapping
or blast2go_cli_v1.5.1 (NOT_USED)
#https://help.biobam.com/space/BCD/2250407989/Installation
#see ~/Scripts/blast2go_pipeline.sh
Option 3 (GO Terms from 'Blast2GO 5 Basic', saved in blast2go_annot.annot2): Interpro based protein families / domains --> Button interpro * Button 'interpro' (Tags: INTERPRO, generated columns: InterPro IDs, InterPro GO IDs, InterPro GO Names) --> "InterProScan Finished - You can now merge the obtained GO Annotations."
MERGE the results of InterPro GO IDs (Option 3) to GO IDs (Option 2) and generate final GO IDs * Button 'interpro'/'Merge InterProScan GOs to Annotation' --> "Merge (add and validate) all GO terms retrieved via InterProScan to the already existing GO annotation." --> "Finished merging GO terms from InterPro with annotations. Maybe you want to run ANNEX (Annotation Augmentation)." #* Button 'annot'/'ANNEX' --> "ANNEX finished. Maybe you want to do the next step: Enzyme Code Mapping."
#-- before merging (blast2go_annot.annot) --
#H0N29_18790 GO:0004842 ankyrin repeat domain-containing protein
#H0N29_18790 GO:0085020
#-- after merging (blast2go_annot.annot2) -->
#H0N29_18790 GO:0031436 ankyrin repeat domain-containing protein
#H0N29_18790 GO:0070531
#H0N29_18790 GO:0004842
#H0N29_18790 GO:0005515
#H0N29_18790 GO:0085020
Option 4 (NOT_USED): RFAM for non-colding RNA
Option 5 (NOT_USED): PSORTb for subcellular localizations
Option 6 (NOT_USED): KAAS (KEGG Automatic Annotation Server)
Find the Closest KEGG Organism Code (NOT_USED)
Since your species isn't directly in KEGG, use a closely related organism.
library(clusterProfiler)
library(KEGGREST)
kegg_organisms <- keggList("organism")
Pick the closest relative (e.g., zebrafish "dre" for fish, Arabidopsis "ath" for plants).
# Search for Acinetobacter in the list
grep("Acinetobacter", kegg_organisms, ignore.case = TRUE, value = TRUE)
# Gammaproteobacteria
#Extract KO IDs from the eggnog results for "Acinetobacter baumannii strain ATCC 19606"
Find the Closest KEGG Organism for a Non-Model Species
If your organism is not in KEGG, search for the closest relative:
grep("fish", kegg_organisms, ignore.case = TRUE, value = TRUE) # Example search
For KEGG pathway enrichment in non-model species, use "ko" instead of a species code (the code has been intergrated in the point 4):
kegg_enrich <- enrichKEGG(gene = gene_list, organism = "ko") # "ko" = KEGG Orthology
Perform KEGG and GO Enrichment in R (under dir ~/DATA/Data_Tam_RNAseq_2024/results/star_salmon/degenes)
#BiocManager::install("GO.db")
#BiocManager::install("AnnotationDbi")
# Load required libraries
library(openxlsx) # For Excel file handling
library(dplyr) # For data manipulation
library(tidyr)
library(stringr)
library(clusterProfiler) # For KEGG and GO enrichment analysis
#library(org.Hs.eg.db) # Replace with appropriate organism database
library(GO.db)
library(AnnotationDbi)
# PREPARING go_terms and ec_terms: annot_* file: cut -f1-2 -d$'\t' blast2go_annot.annot2 > blast2go_annot.annot2_
# Step 1: Load the blast2go annotation file with a check for missing columns
annot_df <- read.table("/home/jhuang/b2gWorkspace_Tam_RNAseq_2024/blast2go_annot.annot2_",
header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE)
# If the structure is inconsistent, we can make sure there are exactly 3 columns:
colnames(annot_df) <- c("GeneID", "Term")
# Step 2: Filter and aggregate GO and EC terms as before
go_terms <- annot_df %>%
filter(grepl("^GO:", Term)) %>%
group_by(GeneID) %>%
summarize(GOs = paste(Term, collapse = ","), .groups = "drop")
ec_terms <- annot_df %>%
filter(grepl("^EC:", Term)) %>%
group_by(GeneID) %>%
summarize(EC = paste(Term, collapse = ","), .groups = "drop")
# Load the results
#res <- read.csv("AUM_vs_MHB-all.csv") #up149, down65
#res <- read.csv("Urine_vs_AUM-all.csv") #up155, down105
res <- read.csv("Urine_vs_MHB-all.csv") #up259, down138
# Replace empty GeneName with modified GeneID
res$GeneName <- ifelse(
res$GeneName == "" | is.na(res$GeneName),
gsub("gene-", "", res$GeneID),
res$GeneName
)
# Remove duplicated genes by selecting the gene with the smallest padj
duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]
res <- res %>%
group_by(GeneName) %>%
slice_min(padj, with_ties = FALSE) %>%
ungroup()
res <- as.data.frame(res)
# Sort res first by padj (ascending) and then by log2FoldChange (descending)
res <- res[order(res$padj, -res$log2FoldChange), ]
# Read eggnog annotations
eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2024/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
# Remove the "gene-" prefix from GeneID in res to match eggnog 'query' format
res$GeneID <- gsub("gene-", "", res$GeneID)
# Merge eggnog data with res based on GeneID
res <- res %>%
left_join(eggnog_data, by = c("GeneID" = "query"))
# DEBUG: NOT_NECESSARY, since res has already GeneName
##Convert row names to a new column 'GeneName' in res
#res_with_geneName <- res %>%
#mutate(GeneName = rownames(res)) %>%
#as.data.frame() # Ensure that it's a regular data frame without row names
## View the result
#head(res_with_geneName)
# Merge with the res dataframe
# Perform the left joins and rename columns
res_updated <- res %>%
left_join(go_terms, by = "GeneID") %>%
left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)
# DEBUG: NOT_NECESSARY, since 'GeneName' is already the first column.
## Reorder columns to move 'GeneName' as the first column in res_updated
#res_updated <- res_updated %>%
#select(GeneName, everything())
## Count the number of rows in the KEGG_ko, GOs, EC columns that have non-missing values
#num_non_missing_KEGG_ko <- sum(res_updated$KEGG_ko != "-" & !is.na(res_updated$KEGG_ko))
#print(num_non_missing_KEGG_ko)
##[1] 2030
#num_non_missing_GOs <- sum(res_updated$GOs != "-" & !is.na(res_updated$GOs))
#print(num_non_missing_GOs)
##[1] 2865 --> 2875
#num_non_missing_EC <- sum(res_updated$EC != "-" & !is.na(res_updated$EC))
#print(num_non_missing_EC)
##[1] 1701
# Filter up-regulated genes
up_regulated <- res_updated[res_updated$log2FoldChange > 2 & res_updated$padj < 0.01, ]
# Filter down-regulated genes
down_regulated <- res_updated[res_updated$log2FoldChange < -2 & res_updated$padj < 0.01, ]
# Create a new workbook
wb <- createWorkbook()
# Add the complete dataset as the first sheet (with annotations)
addWorksheet(wb, "Complete_Data")
writeData(wb, "Complete_Data", res_updated)
# Add the up-regulated genes as the second sheet (with annotations)
addWorksheet(wb, "Up_Regulated")
writeData(wb, "Up_Regulated", up_regulated)
# Add the down-regulated genes as the third sheet (with annotations)
addWorksheet(wb, "Down_Regulated")
writeData(wb, "Down_Regulated", down_regulated)
# Save the workbook to a file
saveWorkbook(wb, "Gene_Expression_with_Annotations_Urine_vs_MHB.xlsx", overwrite = TRUE)
# Set GeneName as row names after the join
rownames(res_updated) <- res_updated$GeneName
res_updated <- res_updated %>% dplyr::select(-GeneName)
## Set the 'GeneName' column as row.names
#rownames(res_updated) <- res_updated$GeneName
## Drop the 'GeneName' column since it's now the row names
#res_updated$GeneName <- NULL
# ---- Perform KEGG enrichment analysis (up_regulated) ----
gene_list_kegg_up <- up_regulated$KEGG_ko
gene_list_kegg_up <- gsub("ko:", "", gene_list_kegg_up)
kegg_enrichment_up <- enrichKEGG(gene = gene_list_kegg_up, organism = 'ko')
# -- convert the GeneID (Kxxxxxx) to the true GeneID --
# Step 0: Create KEGG to GeneID mapping
kegg_to_geneid_up <- up_regulated %>%
dplyr::select(KEGG_ko, GeneID) %>%
filter(!is.na(KEGG_ko)) %>% # Remove missing KEGG KO entries
mutate(KEGG_ko = str_remove(KEGG_ko, "ko:")) # Remove 'ko:' prefix if present
# Step 1: Clean KEGG_ko values (separate multiple KEGG IDs)
kegg_to_geneid_clean <- kegg_to_geneid_up %>%
mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>% # Remove 'ko:' prefixes
separate_rows(KEGG_ko, sep = ",") %>% # Ensure each KEGG ID is on its own row
filter(KEGG_ko != "-") %>% # Remove invalid KEGG IDs ("-")
distinct() # Remove any duplicate mappings
# Step 2.1: Expand geneID column in kegg_enrichment_up
expanded_kegg <- kegg_enrichment_up %>%
as.data.frame() %>%
separate_rows(geneID, sep = "/") %>% # Split multiple KEGG IDs (Kxxxxx)
left_join(kegg_to_geneid_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>% # Explicitly handle many-to-many
distinct() %>% # Remove duplicate matches
group_by(ID) %>%
summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop") # Re-collapse results
#dplyr::glimpse(expanded_kegg)
# Step 3.1: Replace geneID column in the original dataframe
kegg_enrichment_up_df <- as.data.frame(kegg_enrichment_up)
# Remove old geneID column and merge new one
kegg_enrichment_up_df <- kegg_enrichment_up_df %>%
dplyr::select(-geneID) %>% # Remove old geneID column
left_join(expanded_kegg %>% dplyr::select(ID, GeneID), by = "ID") %>% # Merge new GeneID column
dplyr::rename(geneID = GeneID) # Rename column back to geneID
# ---- Perform KEGG enrichment analysis (down_regulated) ----
# Step 1: Extract KEGG KO terms from down-regulated genes
gene_list_kegg_down <- down_regulated$KEGG_ko
gene_list_kegg_down <- gsub("ko:", "", gene_list_kegg_down)
# Step 2: Perform KEGG enrichment analysis
kegg_enrichment_down <- enrichKEGG(gene = gene_list_kegg_down, organism = 'ko')
# --- Convert KEGG gene IDs (Kxxxxxx) to actual GeneIDs ---
# Step 3: Create KEGG to GeneID mapping from down_regulated dataset
kegg_to_geneid_down <- down_regulated %>%
dplyr::select(KEGG_ko, GeneID) %>%
filter(!is.na(KEGG_ko)) %>% # Remove missing KEGG KO entries
mutate(KEGG_ko = str_remove(KEGG_ko, "ko:")) # Remove 'ko:' prefix if present
# Step 4: Clean KEGG_ko values (handle multiple KEGG IDs)
kegg_to_geneid_down_clean <- kegg_to_geneid_down %>%
mutate(KEGG_ko = str_remove_all(KEGG_ko, "ko:")) %>% # Remove 'ko:' prefixes
separate_rows(KEGG_ko, sep = ",") %>% # Ensure each KEGG ID is on its own row
filter(KEGG_ko != "-") %>% # Remove invalid KEGG IDs ("-")
distinct() # Remove duplicate mappings
# Step 5: Expand geneID column in kegg_enrichment_down
expanded_kegg_down <- kegg_enrichment_down %>%
as.data.frame() %>%
separate_rows(geneID, sep = "/") %>% # Split multiple KEGG IDs (Kxxxxx)
left_join(kegg_to_geneid_down_clean, by = c("geneID" = "KEGG_ko"), relationship = "many-to-many") %>% # Handle many-to-many mappings
distinct() %>% # Remove duplicate matches
group_by(ID) %>%
summarise(across(everything(), ~ paste(unique(na.omit(.)), collapse = "/")), .groups = "drop") # Re-collapse results
# Step 6: Replace geneID column in the original kegg_enrichment_down dataframe
kegg_enrichment_down_df <- as.data.frame(kegg_enrichment_down) %>%
dplyr::select(-geneID) %>% # Remove old geneID column
left_join(expanded_kegg_down %>% dplyr::select(ID, GeneID), by = "ID") %>% # Merge new GeneID column
dplyr::rename(geneID = GeneID) # Rename column back to geneID
# View the updated dataframe
head(kegg_enrichment_down_df)
# Create a new workbook
wb <- createWorkbook()
# Save enrichment results to the workbook
addWorksheet(wb, "KEGG_Enrichment_Up")
writeData(wb, "KEGG_Enrichment_Up", as.data.frame(kegg_enrichment_up_df))
# Save enrichment results to the workbook
addWorksheet(wb, "KEGG_Enrichment_Down")
writeData(wb, "KEGG_Enrichment_Down", as.data.frame(kegg_enrichment_down_df))
# ---- Perform GO enrichment analysis (TODO: extract the merged GO IDs from 'Blast2GO 5 Basic' and adapt the code below!)----
#http://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-go.html
#If a user has GO annotation data (in a data.frame format with the first column as gene ID and the second column as GO ID), they can use the enricher() and GSEA() functions to perform an over-representation test and gene set enrichment analysis.
## Perform GO enrichment analysis (Using clusterProfiler does not work, use enricher instead.)
#go_enrichment_up <- enrichGO(gene = gene_list_go_up, OrgDb, ont = "BP") #keyType = "ENSEMBL", organism = 'abn',
#To adapt your code for GO enrichment analysis using the up-regulated genes (149 genes) as the test set and all genes in res (3646 genes) as the background, we need to:
# * Extract the GeneIDs from up_regulated and use them as gene_list (the test set).
# * Extract the GeneIDs from res and use them as background_genes (the universe).
# * Create a TERM2GENE dataframe by extracting GO terms from res.
# * Run the enricher() function using these updated inputs.
# Define gene list (up-regulated genes)
gene_list_go_up <- up_regulated$GeneID # Extract the 149 up-regulated genes
gene_list_go_down <- down_regulated$GeneID # Extract the 65 down-regulated genes
# Define background gene set (all genes in res)
background_genes <- res_updated$GeneID # Extract the 3646 background genes
# Prepare GO annotation data from res
go_annotation <- res_updated[, c("GOs","GeneID")] # Extract relevant columns
go_annotation <- go_annotation %>%
tidyr::separate_rows(GOs, sep = ",") # Split multiple GO terms into separate rows
# Perform GO enrichment analysis, where pAdjustMethod is one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
#Here’s a ranking from least restrictive to most restrictive:
# * none – No correction at all.
# * fdr (same as BH, Benjamini-Hochberg) – Controls the false discovery rate rather than the family-wise error rate (FWER), allowing more significant results.
# * BY (Benjamini-Yekutieli) – Similar to BH but more conservative, accounting for dependence in test statistics.
# * Holm – A step-down version of Bonferroni, less strict than Bonferroni but still controls FWER.
# * Hochberg – Slightly more powerful than Holm, but still controlling FWER.
# * Hommel – A step-up approach controlling FWER, similar to Hochberg but slightly more powerful in some cases.
# * Bonferroni – The most conservative method, dividing the p-value threshold by the number of comparisons, leading to very strict significance criteria.
#So, if you want the least restrictive correction while still controlling for multiple testing, use "fdr" (or "BH"). If you need a more stringent control over Type I errors, methods like Bonferroni or Holm are preferable.
go_enrichment_up <- enricher(
gene = gene_list_go_up, # Up-regulated genes
TERM2GENE = go_annotation, # Custom GO annotation
pvalueCutoff = 1.0, # Significance threshold
pAdjustMethod = "BH",
universe = background_genes # Define the background gene set
)
go_enrichment_up <- as.data.frame(go_enrichment_up)
go_enrichment_down <- enricher(
gene = gene_list_go_down, # Up-regulated genes
TERM2GENE = go_annotation, # Custom GO annotation
pvalueCutoff = 1.0, # Significance threshold
pAdjustMethod = "BH",
universe = background_genes # Define the background gene set
)
go_enrichment_down <- as.data.frame(go_enrichment_down)
## Remove the 'p.adjust' column since no adjusted methods have been applied!
#go_enrichment_up <- go_enrichment_up[, !names(go_enrichment_up) %in% "p.adjust"]
# Update the Description column with the term descriptions
go_enrichment_up$Description <- sapply(go_enrichment_up$ID, function(go_id) {
# Using select to get the term description
term <- tryCatch({
AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
}, error = function(e) {
message(paste("Error for GO term:", go_id)) # Print which GO ID caused the error
return(data.frame(TERM = NA)) # In case of error, return NA
})
if (nrow(term) > 0) {
return(term$TERM)
} else {
return(NA) # If no description found, return NA
}
})
## Print the updated data frame
#print(go_enrichment_up)
## Remove the 'p.adjust' column since no adjusted methods have been applied!
#go_enrichment_down <- go_enrichment_down[, !names(go_enrichment_down) %in% "p.adjust"]
# Update the Description column with the term descriptions
go_enrichment_down$Description <- sapply(go_enrichment_down$ID, function(go_id) {
# Using select to get the term description
term <- tryCatch({
AnnotationDbi::select(GO.db, keys = go_id, columns = "TERM", keytype = "GOID")
}, error = function(e) {
message(paste("Error for GO term:", go_id)) # Print which GO ID caused the error
return(data.frame(TERM = NA)) # In case of error, return NA
})
if (nrow(term) > 0) {
return(term$TERM)
} else {
return(NA) # If no description found, return NA
}
})
addWorksheet(wb, "GO_Enrichment_Up")
writeData(wb, "GO_Enrichment_Up", as.data.frame(go_enrichment_up))
addWorksheet(wb, "GO_Enrichment_Down")
writeData(wb, "GO_Enrichment_Down", as.data.frame(go_enrichment_down))
# Save the workbook with enrichment results
saveWorkbook(wb, "KEGG_and_GO_Enrichments_Urine_vs_MHB.xlsx", overwrite = TRUE)
#Error for GO term: GO:0006807: replace GO:0006807 obsolete nitrogen compound metabolic process
#TODO: marked the color as yellow if the p.adjusted <= 0.05 in GO_enrichment!
Draw the Venn diagram to compare the total DEGs across AUM, Urine, and MHB, irrespective of up- or down-regulation.
# 安装 & 加载必要的库
if (!requireNamespace("ggvenn", quietly = TRUE)) install.packages("ggvenn")
if (!requireNamespace("openxlsx", quietly = TRUE)) install.packages("openxlsx")
#install.packages("ggvenn")
#install.packages("VennDiagram")
#install.packages("ggVennDiagram")
library(ggvenn)
library(openxlsx)
library(VennDiagram)
res <- read.csv("AUM_vs_MHB-all.csv") #up149, down65
res$GeneName <- ifelse(
res$GeneName == "" | is.na(res$GeneName),
gsub("gene-", "", res$GeneID),
res$GeneName
)
duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]
res <- res %>%
group_by(GeneName) %>%
slice_min(padj, with_ties = FALSE) %>%
ungroup()
res <- as.data.frame(res)
res <- res[order(res$padj, -res$log2FoldChange), ]
eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2024/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
res$GeneID <- gsub("gene-", "", res$GeneID)
res <- res %>%
left_join(eggnog_data, by = c("GeneID" = "query"))
res_updated <- res %>%
left_join(go_terms, by = "GeneID") %>%
left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)
up_regulated_AUM_vs_MHB <- res_updated[res_updated$log2FoldChange > 2 & res_updated$padj < 0.01, ]
down_regulated_AUM_vs_MHB <- res_updated[res_updated$log2FoldChange < -2 & res_updated$padj < 0.01, ]
res <- read.csv("Urine_vs_AUM-all.csv") #up155, down105
res$GeneName <- ifelse(
res$GeneName == "" | is.na(res$GeneName),
gsub("gene-", "", res$GeneID),
res$GeneName
)
duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]
res <- res %>%
group_by(GeneName) %>%
slice_min(padj, with_ties = FALSE) %>%
ungroup()
res <- as.data.frame(res)
res <- res[order(res$padj, -res$log2FoldChange), ]
eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2024/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
res$GeneID <- gsub("gene-", "", res$GeneID)
res <- res %>%
left_join(eggnog_data, by = c("GeneID" = "query"))
res_updated <- res %>%
left_join(go_terms, by = "GeneID") %>%
left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)
up_regulated_Urine_vs_AUM <- res_updated[res_updated$log2FoldChange > 2 & res_updated$padj < 0.01, ]
down_regulated_Urine_vs_AUM <- res_updated[res_updated$log2FoldChange < -2 & res_updated$padj < 0.01, ]
res <- read.csv("Urine_vs_MHB-all.csv") #up259, down138
res$GeneName <- ifelse(
res$GeneName == "" | is.na(res$GeneName),
gsub("gene-", "", res$GeneID),
res$GeneName
)
duplicated_genes <- res[duplicated(res$GeneName), "GeneName"]
res <- res %>%
group_by(GeneName) %>%
slice_min(padj, with_ties = FALSE) %>%
ungroup()
res <- as.data.frame(res)
res <- res[order(res$padj, -res$log2FoldChange), ]
eggnog_data <- read.delim("~/DATA/Data_Tam_RNAseq_2024/eggnog_out.emapper.annotations.txt", header = TRUE, sep = "\t")
res$GeneID <- gsub("gene-", "", res$GeneID)
res <- res %>%
left_join(eggnog_data, by = c("GeneID" = "query"))
res_updated <- res %>%
left_join(go_terms, by = "GeneID") %>%
left_join(ec_terms, by = "GeneID") %>% dplyr::select(-EC.x, -GOs.x) %>% dplyr::rename(EC = EC.y, GOs = GOs.y)
up_regulated_Urine_vs_MHB <- res_updated[res_updated$log2FoldChange > 2 & res_updated$padj < 0.01, ]
down_regulated_Urine_vs_MHB <- res_updated[res_updated$log2FoldChange < -2 & res_updated$padj < 0.01, ]
# 1. 生成基因集
AUM_DEGs <- unique(c(up_regulated_AUM_vs_MHB$GeneID, down_regulated_AUM_vs_MHB$GeneID))
Urine_DEGs <- unique(c(up_regulated_Urine_vs_AUM$GeneID, down_regulated_Urine_vs_AUM$GeneID))
MHB_DEGs <- unique(c(up_regulated_Urine_vs_MHB$GeneID, down_regulated_Urine_vs_MHB$GeneID))
# 2. 生成 Venn 图
venn_plot <- ggvenn(
list(AUM = AUM_DEGs, Urine = Urine_DEGs, MHB = MHB_DEGs),
fill_color = c("blue", "red", "green"), # 颜色设定
stroke_size = 0.5, # 线条粗细
set_name_size = 5 # 组名字体大小
)
# 3. 保存 Venn 图为 PNG
ggsave("Venn_Diagram.png", plot = venn_plot, width = 6, height = 6, dpi = 300)
# 4. 计算各个集合的交集
AUM_Urine <- intersect(AUM_DEGs, Urine_DEGs)
AUM_MHB <- intersect(AUM_DEGs, MHB_DEGs)
Urine_MHB <- intersect(Urine_DEGs, MHB_DEGs)
AUM_Urine_MHB <- Reduce(intersect, list(AUM_DEGs, Urine_DEGs, MHB_DEGs))
# 5. 保存基因列表到 Excel 文件
#venn_gene_list <- list(
# "AUM_DEGs" = AUM_DEGs,
# "Urine_DEGs" = Urine_DEGs,
# "MHB_DEGs" = MHB_DEGs,
# "AUM_Urine" = AUM_Urine,
# "AUM_MHB" = AUM_MHB,
# "Urine_MHB" = Urine_MHB,
# "AUM_Urine_MHB" = AUM_Urine_MHB
#)
#write.xlsx(venn_gene_list, file = "Venn_GeneLists.xlsx", rowNames = FALSE)
# 创建 Excel 文件,存储各个 Venn 区域的基因
venn_list <- list(
"AUM_only" = setdiff(AUM_DEGs, c(Urine_DEGs, MHB_DEGs)),
"Urine_only" = setdiff(Urine_DEGs, c(AUM_DEGs, MHB_DEGs)),
"MHB_only" = setdiff(MHB_DEGs, c(AUM_DEGs, Urine_DEGs)),
"AUM_Urine" = intersect(AUM_DEGs, Urine_DEGs),
"AUM_MHB" = intersect(AUM_DEGs, MHB_DEGs),
"Urine_MHB" = intersect(Urine_DEGs, MHB_DEGs),
"All_three" = Reduce(intersect, list(AUM_DEGs, Urine_DEGs, MHB_DEGs))
)
# 写入 Excel
write.xlsx(venn_list, file = "Venn_Diagram_Genes.xlsx")
# 打印完成信息
print("Venn diagram saved as 'Venn_Diagram.png' and gene lists saved in 'Venn_Diagram_Genes.xlsx'.")
点赞本文的读者
还没有人对此文章表态
没有评论
DNAseq 2025_AYE and DNAseq 2025_adeABadeIJ_adeIJK_CM1_CM2
Processing for Data_Tam_DNAseq_2025
Presence-Absence Table and Graphics for Selected Genes in Data_Patricia_Sepi_7samples
© 2023 XGenes.com Impressum