Quality Control and Data Visualization- Kevin Lin

Presentation (Word Document):
https://docs.google.com/document/d/1uZUE9wlJAG7Byg33ALmKBycy4x7xPoTEqiKIIWvv7QE/edit?usp=sharing

** All relevant code is found in the document**

Challenges Faced:

  • The main challenge that I faced was the normalization section of the deliverables. I normalized the data after removing the outliers when I should have normalized the data before any outlier removal. This was overcome after our first deliverable presentations and the issue was corrected to avoid invalid data in future deliverables.
  • With no previous coding experience other than the training videos provided by STEM-Away, details that strayed away from the code shown in the training videos impacted my progression more than I would have liked. I relied on the troubleshooting thread to overcome this challenge.
  • Another issue that occured was that I removed all the outliers given in the arrayQualityMetrics index file instead of just removing the major outliers. I resolved this issue before moving onto future deliverables as well.
  • Getting in touch with my team members was also difficult due to the situations they were facing outside the intership. With the help of my team lead and technical leads, I was was able complete the first set of deliverables.

Summary of Work:

  • Made a schedule/guide for the team, highlighting when deliverables should be due
  • Scheduled google meetings with team members
  • Data curation and pre-processing
  • Quality control- Simpleaffy and ArrayQualityMetrics
  • Normalization- mas5 & log 2: Boxplot before and after normalization
  • Batch effect correction- Combat
  • Visualization- heatmaps before and after batch correction
  • Created a deliverable overview document containing created code and visuals
  • Presented all data and work during the deliverables meeting

Further Notes:

  • I wished had I more time to explore other forms of data normalization, but other than that, I enjoyed the quality control section of the project. The technical leads were very helpful when it came to explaining what we were doing with our data.