Author Archives: gene_x

Gene Set Variation Analysis (GSVA) and Visualization of Gene Sets from Excel Signatures

#install.packages("readxl")
library(readxl)

# Path to the Excel file
file_path <- "Signatures.xls"

#example of a signature:
#geneSymbol geneEntrezID    ENSEMBL GeneSet
#CD160  11126   ENSG00000117281 Anergic or act. T cells
#CD244  51744   ENSG00000122223 Anergic or act. T cells
#CTLA4  1493    ENSG00000163599 Anergic or act. T cells
#HAVCR2 84868   ENSG00000135077 Anergic or act. T cells
#ICOS   29851   ENSG00000163600 Anergic or act. T cells
#KLRG1  10219   ENSG00000139187 Anergic or act. T cells
#LAG3   3902    ENSG00000089692 Anergic or act. T cells
#PDCD1  5133    ENSG00000188389 Anergic or act. T cells
#PDCD1  5133    ENSG00000276977 Anergic or act. T cells

# Get the names of the sheets
sheet_names <- excel_sheets(file_path)

# Initialize an empty list to hold gene sets
geneSets <- list()

# Loop over each sheet, extract the ENSEMBL IDs, and add to the list
for (sheet in sheet_names) {
  # Read the sheet
  data <- read_excel(file_path, sheet = sheet)

  # Process the GeneSet names (replacing spaces with underscores, for example)
  gene_set_name <- gsub(" ", "_", unique(data$GeneSet)[1])

  # Add ENSEMBL IDs to the list
  geneSets[[gene_set_name]] <- as.character(data$ENSEMBL)
}

# Print the result to check
print(geneSets)

# 1. Compute GSVA scores:
gsva_scores <- gsva(exprs, geneSets, method="gsva")

# 2. Convert to data.frame for ggplot:
gsva_df <- as.data.frame(t(gsva_scores))

# 3. Add conditions to gsva_df:
gsva_df$Condition <- dds$condition

# 4. Filter the gsva_df to retain only the desired conditions:
gsva_df_filtered <- gsva_df[gsva_df$Condition %in% c("Ace2_mock_2h", "Ace2_inf_24h"), ]

# 5. Define a function to plot violin plots:
# Update the condition levels in gsva_df_filtered to ensure the desired order on x-axis:
gsva_df_filtered$Condition <- gsub("Ace2_mock_2h", "mock", gsva_df_filtered$Condition)
gsva_df_filtered$Condition <- gsub("Ace2_inf_24h", "infection", gsva_df_filtered$Condition)
gsva_df_filtered$Condition <- factor(gsva_df_filtered$Condition, levels = c("mock", "infection"))
plot_violin <- function(data, gene_name) {
  # Calculate the t-test p-value for the two conditions
  condition1_data <- data[data$Condition == "mock", gene_name]
  condition2_data <- data[data$Condition == "infection", gene_name]
  p_value <- t.test(condition1_data, condition2_data)$p.value

  # Convert p-value to annotation
  p_annotation <- ifelse(p_value < 0.01, "**", ifelse(p_value < 0.05, "*", ""))
  rounded_p_value <- paste0("p = ", round(p_value, 2))

  plot_title <- gsub("_", " ", gene_name)
  p <- ggplot(data, aes(x=Condition, y=!!sym(gene_name))) +
    geom_violin(linewidth=1.2) + 
    labs(title=plot_title, y="GSVA Score") +
    ylim(-1, 1) +
    theme_light() +
    theme(
      axis.title.x = element_text(size=12),
      axis.title.y = element_text(size=12),
      axis.text.x  = element_text(size=10),
      axis.text.y  = element_text(size=10),
      plot.title   = element_text(size=12, hjust=0.5)
    )

  # Add p-value annotation to the plot
  p <- p + annotate("text", x=1.5, y=0.9, label=paste0(p_annotation, " ", rounded_p_value), size=5, hjust=0.5)

  return(p)
}

# 6. Generate the list of plots:
#genes <- colnames(gsva_df_filtered)[!colnames(gsva_df_filtered) %in% "Condition"]
#plots_list <- lapply(genes, function(gene) plot_violin(gsva_df_filtered, gene))

# 6. Generate the list of plots in a predefined order:
desired_order <- c("Platelets","Granulocytes","LDG","pDC","Anti-inflammation",  "Pro-inflam._IL-1","Dendritic_cells","MHC_II","Alt._complement","TNF",  "NLRP3_inflammasome","Unfolded_protein","B_cells","Monocyte_cell_surface","Inflammasome",  "Monocyte_secreted","IL-1_cytokines","SNOR_low_UP","CD40_activated","Lectin_complement",  "Classical_complement","Cell_cycle","Plasma_cells","IG_chains","Erythrocytes",  "IL-6R_complex","IFN","TCRB","TCRA","Cyt._act._T_cells",  "TCRG","T_cells","CD8T-NK-NKT","Anergic_or_act._T_cells","T_activated",  "NK_cells","TCRD","T_regs","SNOR_low_DOWN","Monocytes",  "Myeloid_cells","Neutrophils")
genes <- colnames(gsva_df_filtered)[!colnames(gsva_df_filtered) %in% "Condition"]
genes <- genes[match(desired_order, genes)]
plots_list <- lapply(genes, function(gene) plot_violin(gsva_df_filtered, gene))

# 7. Pad the list of plots:
remaining_plots <- 6 - (length(plots_list) %% 6)
if (remaining_plots != 6) {
  plots_list <- c(plots_list, rep(list(NULL), remaining_plots))
}

# 8. Create the plots and arrange them in a grid:
library(gridExtra)
plots_matrix <- matrix(plots_list, ncol=6, byrow=T)
#do.call("grid.arrange", c(plots_matrix, list(ncol=6)))

# 9. Save the plots to a PNG:
png("All_Violin_Plots.png", width=1000, height=1000)
do.call("grid.arrange", c(plots_matrix, list(ncol=6)))
dev.off()

Understanding xrandr Output and Connection Types

xrandr
Screen 0: minimum 8 x 8, current 3840 x 1080, maximum 16384 x 16384
VGA-0 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm
  1920x1080     60.00*+
  1680x1050     59.95  
  1600x900      60.00  
  1440x900      59.89  
  1280x1024     75.02    60.02  
  1280x800      59.81  
  1280x720      60.00  
  1152x864      75.00  
  1024x768      75.03    70.07    60.00  
  800x600       75.00    72.19    60.32    56.25  
  640x480       75.00    72.81    59.94  
DVI-D-0 disconnected (normal left inverted right x axis y axis)
HDMI-0 connected 1920x1080+1920+0 (normal left inverted right x axis y axis) 521mm x 293mm
  1920x1080     60.00*+  50.00  
  1680x1050     59.95  
  1600x900      60.00  
  1440x900      59.89  
  1280x1024     75.02    60.02  
  1280x800      59.81  
  1280x720      60.00    50.00  
  1152x864      75.00  
  1024x768      75.03    70.07    60.00  
  800x600       75.00    72.19    60.32    56.25  
  720x576       50.00  
  720x480       59.94  

From the provided output:

  1. Screen 0:

    • The minimum screen resolution your system supports is 8×8.
    • Your current screen resolution is 3840×1080 (which suggests a dual monitor setup, each with a resolution of 1920×1080).
    • The maximum resolution supported is 16384×16384.
  2. VGA-0:

    • This is the first display connected via a VGA port.
    • It is currently set as the primary display.
    • Its current resolution is 1920×1080 (the * denotes the current resolution and the + denotes the preferred resolution).
    • It supports various other resolutions as listed.
  3. DVI-D-0:

    • This display is currently disconnected.
    • No other information about this display is provided since it’s not connected.
  4. HDMI-0:

    • This is the second display connected via an HDMI port.
    • Its current resolution is 1920×1080.
    • It supports various other resolutions, similar to VGA-0.

Our two screens (VGA-0 and HDMI-0) are side by side, creating a total screen resolution of 3840×1080.

From the xrandr output, it’s not directly evident that we are using an HDMI-to-DVI cable. However, we can infer this from the connection types and their states:

  1. HDMI-0 connected: This indicates that a device is connected to the HDMI port of your computer.
  2. DVI-D-0 disconnected: This indicates that there is no direct connection to the DVI port on your computer.

If we know that our second monitor only has a DVI input and we’re using an HDMI-to-DVI cable to connect it, then the “HDMI-0 connected” state in the xrandr output is a result of the HDMI end of our converter cable being connected to our computer’s HDMI port.

However, it’s essential to clarify that xrandr only shows the connection state from the perspective of the computer’s output ports. It won’t specify cable types or conversion methods. We usually determine the use of converter cables by knowing our hardware and connection setup.