Quality Control and Data Visualization - Daniel Drucker

Progress Summary - Quality Control and Data Visualization - Daniel Drucker

Presentation
Link: https://drive.google.com/file/d/1OHI7m3JDC1B_i1BIi-JRSALTtvT32wAL/view?usp=sharing

Contribution: Completed the code to clean data set (that is, quality control, normalization, and batch correction)

Challenges Faced:

  • Before I came to this program, I was completely unfamiliar with R. Even though I have prior coding experience, I found it hard to internalize some details of the syntax. Namely in keeping track of the data types of objects, especially when data was moved between matrices and data frames. This took some trial and error to understand.
  • The finite details of the syntax for the aesthetic characteristics of graph outputs is something that simply can’t be approached with raw intuition. My group mate, who payed much more cautious attention to the way the coding works for this was extremely helpful in this step.

Summary of Work:

  • Quality control using affyPLM
  • Output RLE and NUSE plots
  • Removed batch effects from data

Further notes

  • I found it difficult to internalize what exactly I was doing to the data during some of these processing steps. To the extent I knew I could feed a set of data to the correct function and yield a processed data set, I could produce the work adequately. But I was intrigued by what, mathematically, was going on within the functions.
  • Prior to joining this program I had no idea gene expression was something that could be conveniently quantified, which I find intriguing.