Austin_Yang - Bioinformatics Pathway

Overview of things learned:
Week 2 (7/28/20 - 8/3/20)

  • Technical - Using R Studio, as well as databases provided in the deliverables, limma matrices and volcano plots were created to compare cancer groups with normal groups
  • Tools - R Studio, Slack, Google Meet, GitHub, Bioconductor
  • Soft Skills - communication, problem-solving, cooperation, adaptability, critical thinking

Three achievement highlights:

  1. With group, figured out an efficient method to remove duplicates in the matrix for both gene symbols and probe IDs
  2. With group, determined a method to filter out genes below the 2nd centile through the expression dataset created the previous week
  3. Met with group to not only discuss individual progress but also plan out a concise presentation and familiarized ourselves with each other

List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours

  • 29/07 - GitHub webinar
  • 20/07 - Office Hours
  • 01/08 - Group 3 meeting to discuss deliverables
  • 03/08 - Team 1 meeting
  • 04/08 - Team 1 deliverables presentation

Goals for the upcoming week:

  • Complete deliverables at a quicker pace. Try to get more done in an individual day
  • Network with my group and others in STEMAway
  • Practice using R and familiarize myself with certain functions
  • Practice using GitHub and merging code
  • Communicate more effectively with team, and make sure all work is completed in a timely fashion

Detailed statement of tasks done:
Deliverables:

  • Using hgu133plus2.db, created expression matrix with probeset IDs and gene symbols
  • Filtered out certain genes based on expression and availability of data
  • Created and analyze limma matrix
  • Transfer data from limma matrix into a volcano plot

Other:

  • Practiced communication with team on slack
  • Familiarized myself with GitHub and its functions
  • Toyed around with Asana (may be able to use it in the future)

Challenges and how those challenges were overcome:

  • Struggled with removing duplicated keys with collapseRows(). Worked with team to determine an effective method to remove all duplicates in gene symbols
  • Struggled with filtering out certain genes via expression matrix. Realized that it required data retrieved from the previous week
1 Like

Overview of things learned:
Week 1 (7/21/20 - 7/27/20)

  • Technical - Using R Studio, as well as several Quality Control, Normalization, and Batch Correction Packages, processed data from GEO repositories to an easily analyzable form. Created PCA plots and heatmaps to analyze initial data.
  • Tools - R Studio, Slack, Bioconductor
  • Soft Skills - communication, problem-solving, cooperation, persistence, critical thinking

Three achievement highlights:

  1. Learned basics of R, as well as how to perform quality control, normalization, and batch correction with R
  2. Got to know my group (background, hobbies, interests, etc.) and discussed out initial progress and thoughts
  3. Created first PCA plots and heatmaps with R

List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours

Goals for the upcoming week:

  • Complete deliverables at a quicker pace.
  • Network with my group and others in STEMAway, get to know my group better
  • Practice using R and familiarize myself with certain functions
  • Communicate more effectively with team, and make sure all work is completed in a timely fashion

Detailed statement of tasks done:
Deliverables:

  • Performed quality control with packages such as ArrayQualityMetrics, affyPLM, and simpleAffy to analyze the raw data and remove outliers

  • Performed normalization with gcrma to standardize data and reduce variability, which may impede results

  • Performed batch correction using ComBat (an sva package) with provided metadata

  • Created first PCA plots and heatmaps, allowing us to visualize our processed data
    Other:

  • Practiced communication with team on slack

  • Learned the basics of R and its functions
    Challenges and how those challenges were overcome:

  • Sometimes used Python language rather than R. Fixed this by carefully looking over my code

  • Struggled with downloading raw data. Fixed this by watching tutorial more carefully.

Overview of things learned:
Week 3 (8/4/20 - 8/11/20)

  • Technical - Created several plots to analyze correlations in data, such as the GO and KEGG plot
  • Tools - R Studio, Slack, Bioconductor, Google, GitHub
  • Soft Skills - communication, problem-solving, cooperation, persistence, critical thinking, independence, patience

Three achievement highlights:

  1. Created first GO and KEGG plots using R. Able to see correlations and genes most responsible
  2. Gained better understanding of data, as well as the genes primarily responsible for it
  3. Worked with team to work out small discrepancies in plots

List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours

Goals for the upcoming week:

  • Begin working on final project
  • Network with my group and others in STEMAway, get to know my group better
  • Practice using R and familiarize myself with certain functions
  • Communicate more effectively with team, and make sure all work is completed in a timely fashion

Detailed statement of tasks done:
Deliverables:

  • Created GO plots using data. Shows upregulation and downregulation of certain genes that led to cancer
  • Created KEGG plots and performed KEGG analysis. Saw which diseases seem most similar to the cancer in terms of gene involvement
  • Created a gene concept network, attributing genes with certain symptoms
  • Survival Analysis was performed, seeing survival curves in a certain gene to the cancer.

Other:

  • Learned more advanced R and its functions
    Challenges and how those challenges were overcome:
  • Arguably most difficult deliverable. Required lots of time and patience
  • Some plots were not perfect. Tried making them as correct as possible, but still room for improvement

Overview of things learned:
Week 4 (8/12/20 - 8/18/20)

  • Technical - Started Final project
  • Tools - R Studio, Slack, Bioconductor, Google
  • Soft Skills - problem-solving, persistence, critical thinking, independence, patience

Three achievement highlights:

  1. Began working on final project. Got to understand lung cancer and its symptoms
  2. Developed further understanding of the genes associated with lung cancer
  3. Learned certain connections between lung cancer and other diesases

List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours

Goals for the upcoming week:

  • Finish working on final project
  • Network with my group and others in STEMAway, get to know my group better
  • Practice using R and familiarize myself with certain functions

Detailed statement of tasks done:
Deliverables:

  • Performed Quality Control, Normalization, and Batch Correction on raw data
  • Created own metadata from GEO database
  • Created heatmaps, GO plots, and KEGG plots
  • Made conclusions on cancer

Other:

  • Learned more advanced R and its functions
    Challenges and how those challenges were overcome:
  • Lack of communication with a team was difficult. Worked this out with lots of googling and self-studying
  • Some struggles with quality control,. worked issue out with help of Anca

Overview of things learned: Week 5 (8/12/20 - 8/17/20)

  • Technical - Finalized and presented final project
  • Tools - R Studio, Google Meets, Slack
  • Soft Skills - communication, problem-solving, cooperation, persistence, critical thinking, independence, patience, presentation skills, public speaking

Three achievement highlights:

  1. Finished project and prepared presentation
  2. Practice presentation with Sarah. Received feedback
  3. Presented presentation in front of Yves and Sarah. Received important constructive criticism.

List of meetings attended including social team events: Attended all meetings except for happy hour and office hours

Goals for the upcoming week:

  • Learn more R
  • Consider other bioinformatics internships

Detailed statement of tasks done: Deliverables:

  • Using similar bioinformatics pipeline learned in internship, performed similar preprocessing, processing, and analysis with new raw data for lung cancer.
  • Made conclusions and compared conclusions with hypothesis
  • Set future goals for R

Other:

  • Learned more advanced R and its functions Challenges and how those challenges were overcome:
  • Performed extremely poorly on presentation. Will work on public speaking skills and R skills
  • Time management. Finding a time to present and practice was difficult. Will work on time management in future.

Link: Final Project - Google Slides