Once again, the issue of knowing where to begin was a very time consuming problem. I struggled with creating gene vectors of up and down regulated DEGs. However, this was overcome by substituting the vector with a column of a created matrix. The gene vectors were eventually generated after consulting my team technical lead.
The answer to the previous challenge led to an error when it came to running the GSEA function with the matrix column. By contacting a technical lead, I substituted the column of the created matrix with an actual gene vector and by adjusting p value cutoffs, enriched terms were identified. This allowed for the gseaplot2 to run and create a graph with visible peaks.
Summary of Work:
Defined significant DEGs into a vector and converted gene symbols to their entrez IDs using the org.Hs.eg.db database
Visualized DEGs after enrichGO analysis using barplots
Visualized DEGs after enrichKEGG pathway analysis using dot plots
Observed complex associations between genes using gene-concept networks
Conducted global gene set enrichment analysis using hallmark gene sets
- Although my graphs were generated, I continue to think I could have done a better job with this section of the project. My refined data had very little enriched terms even though my graphs were said to look correct. Of course, even with a handful of enriched terms, I was able to draw conclusions regarding the differences in gene expression between colorectal cancer cells and normal cells.