DGE Analysis- Kevin Lin

Presentation (PowerPoint):

** All relevant code is found in the document**

Challenges Faced:

  • In the beginning, I was unable to run the collapseRows function. However, after consulting the deliverables troubleshooting thread and adjusting the expression set data, I was able to run the function. The issue was due to the addition of a character column (gene symbols) to the expression set which converted all the other columns into a character data type which had to be converted back into a numeric data type.
  • Another issue that I had was not being able to run the lmFit function due to issues with my expression set data. This was resolved after researching potential issues and discovering that by using the tidyverse package and renaming the rownames of the input data, the lmFit function could be run.
  • Contacting my other group members continued to be an issue, so I was switched to another team after the second deliverables meeting.

Summary of Work:

  • Annotations using hgu133plus2.db to map probe IDs with gene symbols
  • Refined expression set data by omitting rows with duplicates (gene symbols and probe IDs) and missing data using collapseRows and na.omit function
  • Researched tidyverse package and used it to edit refined expression set data
  • Filtered out genes below 2nd centile of expression distribution of dataset
  • Analyzed data with limma package and sorted top DEGs using topTable function
  • Generated and edited heatmap and volcano plot of DEGs
  • Created and delivered a presentation of all work completed
  • Explained the process of completing annotations in order to code probe IDs into gene symbols

Further Notes:

  • Although I think the heatmap of my top DEGs could have looked a lot better, I was able to make observations based on it. I was especially proud with my ability to explain the process of my work during my deliverable presentation. Independent research played a crucial role in overcoming errors in my code.