Final Project Progress Summary
- At first, I had to change the data set I was using a few times because there were some problems during the quality control step. However, this was resolved with some trial and error and the rest of the process went pretty smoothly.
- I spent some more time understanding the biological aspect of the data to give an in- depth explanation during the presentation
Summary of work:
- Conducted quality control using packages like simpleaffy, and arrayquality metrics to identify outliers
- Removed the outliers from the downloaded data set plotted PCAs to confirm these results
- Got the metadata using the instructions from google drive
- Annotation and gene filtering which removed the NA and low expression genes
- Conducted analysis with limma and plotted Volcano plots and Heatmaps- wrote the top 100 DEGs into a file
- Defined a significant DEG vector and converted the gene symbols into gene IDs
- Performed EnrichGO, EnrichKEGG, and EnrichDGN analysis and visulaized the results into a barplot, cnet plot, and dotplot
- Performed GSEA analysis to analyze hallmark genes
- Created a presentation using some of the results
For the final presentation, I identified which plots would be most effective to use and explain. I did research on those specific plots and found how they related to colorectal cancer and the data set I used. Since the presentation was a little longer than the normal weekly presentations, I was able to go in-depth about certain biological signaling pathways related to colorectal cancer.