gene_x 0 like s 583 view s
Tags: R, human, genomics
The code provided is designed to annotate a list of genes by fetching associated gene information from the Ensembl database. Starting with the biomaRt library, it connects to the Ensembl dataset specifically tailored for human genes. Then, gene expression data is imported from a CSV file named "DEGs_heatmap_data.csv". The primary loop in the script iterates over each gene, retrieves detailed attributes such as gene name, gene biotype, chromosome position, and more, and then appends this to the initial data. To manage instances where the database returns multiple entries for a single gene, or none at all, there are conditions to handle such cases by either selecting the first entry or populating with 'NA' values respectively. As the loop progresses, for every 100 records, a progress message is printed to give users an indication of how much work has been completed. Once all genes are annotated, the enriched dataset is saved to a new CSV file named "annotated_output_file.csv".
# Load the necessary library
library(biomaRt)
# Connect to the Ensembl dataset
ensembl <- useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl", version="104")
# Read the data from your CSV file
data <- read.csv("DEGs_heatmap_data.csv", header=TRUE, row.names=1)
# Initialize an empty data frame for the annotated data
annotated_data <- data.frame()
# Determine total number of genes
total_genes <- length(rownames(data))
# Loop through each gene to annotate
for (i in 1:total_genes) {
gene <- rownames(data)[i]
result <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name', 'gene_biotype', 'entrezgene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'description'),
filters = 'ensembl_gene_id',
values = gene,
mart = ensembl)
# If multiple rows are returned, take the first one
if (nrow(result) > 1) {
result <- result[1, ]
}
# Check if the result is empty
if (nrow(result) == 0) {
result <- data.frame(ensembl_gene_id = gene,
external_gene_name = NA,
gene_biotype = NA,
entrezgene_id = NA,
chromosome_name = NA,
start_position = NA,
end_position = NA,
strand = NA,
description = NA)
}
# Transpose expression values
expression_values <- t(data.frame(t(data[gene, ])))
colnames(expression_values) <- colnames(data)
# Combine gene information and expression data
combined_result <- cbind(result, expression_values)
# Append to the final dataframe
annotated_data <- rbind(annotated_data, combined_result)
# Print progress every 100 genes
if (i %% 100 == 0) {
cat(sprintf("Processed gene %d out of %d\n", i, total_genes))
}
}
# Save the annotated data to a new CSV file
write.csv(annotated_data, "annotated_output_file.csv", row.names=FALSE)
点赞本文的读者
还没有人对此文章表态
没有评论
Display viral transcripts found in mRNA-seq MKL-1, WaGa EVs compared to cells
Phyloseq for GPA vs RA vs control
© 2023 XGenes.com Impressum