Overview of things learned:
Week 2 (7/28/20 - 8/3/20)
-
Technical - Using R Studio, as well as databases provided in the deliverables, limma matrices and volcano plots were created to compare cancer groups with normal groups
-
Tools - R Studio, Slack, Google Meet, GitHub, Bioconductor
-
Soft Skills - communication, problem-solving, cooperation, adaptability, critical thinking
Three achievement highlights:
- With group, figured out an efficient method to remove duplicates in the matrix for both gene symbols and probe IDs
- With group, determined a method to filter out genes below the 2nd centile through the expression dataset created the previous week
- Met with group to not only discuss individual progress but also plan out a concise presentation and familiarized ourselves with each other
List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours
- 29/07 - GitHub webinar
- 20/07 - Office Hours
- 01/08 - Group 3 meeting to discuss deliverables
- 03/08 - Team 1 meeting
- 04/08 - Team 1 deliverables presentation
Goals for the upcoming week:
- Complete deliverables at a quicker pace. Try to get more done in an individual day
- Network with my group and others in STEMAway
- Practice using R and familiarize myself with certain functions
- Practice using GitHub and merging code
- Communicate more effectively with team, and make sure all work is completed in a timely fashion
Detailed statement of tasks done:
Deliverables:
- Using hgu133plus2.db, created expression matrix with probeset IDs and gene symbols
- Filtered out certain genes based on expression and availability of data
- Created and analyze limma matrix
- Transfer data from limma matrix into a volcano plot
Other:
- Practiced communication with team on slack
- Familiarized myself with GitHub and its functions
- Toyed around with Asana (may be able to use it in the future)
Challenges and how those challenges were overcome:
- Struggled with removing duplicated keys with collapseRows(). Worked with team to determine an effective method to remove all duplicates in gene symbols
- Struggled with filtering out certain genes via expression matrix. Realized that it required data retrieved from the previous week
1 Like
Overview of things learned:
Week 1 (7/21/20 - 7/27/20)
-
Technical - Using R Studio, as well as several Quality Control, Normalization, and Batch Correction Packages, processed data from GEO repositories to an easily analyzable form. Created PCA plots and heatmaps to analyze initial data.
-
Tools - R Studio, Slack, Bioconductor
-
Soft Skills - communication, problem-solving, cooperation, persistence, critical thinking
Three achievement highlights:
- Learned basics of R, as well as how to perform quality control, normalization, and batch correction with R
- Got to know my group (background, hobbies, interests, etc.) and discussed out initial progress and thoughts
- Created first PCA plots and heatmaps with R
List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours
Goals for the upcoming week:
- Complete deliverables at a quicker pace.
- Network with my group and others in STEMAway, get to know my group better
- Practice using R and familiarize myself with certain functions
- Communicate more effectively with team, and make sure all work is completed in a timely fashion
Detailed statement of tasks done:
Deliverables:
-
Performed quality control with packages such as ArrayQualityMetrics, affyPLM, and simpleAffy to analyze the raw data and remove outliers
-
Performed normalization with gcrma to standardize data and reduce variability, which may impede results
-
Performed batch correction using ComBat (an sva package) with provided metadata
-
Created first PCA plots and heatmaps, allowing us to visualize our processed data
Other:
-
Practiced communication with team on slack
-
Learned the basics of R and its functions
Challenges and how those challenges were overcome:
-
Sometimes used Python language rather than R. Fixed this by carefully looking over my code
-
Struggled with downloading raw data. Fixed this by watching tutorial more carefully.
Overview of things learned:
Week 3 (8/4/20 - 8/11/20)
-
Technical - Created several plots to analyze correlations in data, such as the GO and KEGG plot
-
Tools - R Studio, Slack, Bioconductor, Google, GitHub
-
Soft Skills - communication, problem-solving, cooperation, persistence, critical thinking, independence, patience
Three achievement highlights:
- Created first GO and KEGG plots using R. Able to see correlations and genes most responsible
- Gained better understanding of data, as well as the genes primarily responsible for it
- Worked with team to work out small discrepancies in plots
List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours
Goals for the upcoming week:
- Begin working on final project
- Network with my group and others in STEMAway, get to know my group better
- Practice using R and familiarize myself with certain functions
- Communicate more effectively with team, and make sure all work is completed in a timely fashion
Detailed statement of tasks done:
Deliverables:
- Created GO plots using data. Shows upregulation and downregulation of certain genes that led to cancer
- Created KEGG plots and performed KEGG analysis. Saw which diseases seem most similar to the cancer in terms of gene involvement
- Created a gene concept network, attributing genes with certain symptoms
- Survival Analysis was performed, seeing survival curves in a certain gene to the cancer.
Other:
- Learned more advanced R and its functions
Challenges and how those challenges were overcome:
- Arguably most difficult deliverable. Required lots of time and patience
- Some plots were not perfect. Tried making them as correct as possible, but still room for improvement
Overview of things learned:
Week 4 (8/12/20 - 8/18/20)
-
Technical - Started Final project
-
Tools - R Studio, Slack, Bioconductor, Google
-
Soft Skills - problem-solving, persistence, critical thinking, independence, patience
Three achievement highlights:
- Began working on final project. Got to understand lung cancer and its symptoms
- Developed further understanding of the genes associated with lung cancer
- Learned certain connections between lung cancer and other diesases
List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours
Goals for the upcoming week:
- Finish working on final project
- Network with my group and others in STEMAway, get to know my group better
- Practice using R and familiarize myself with certain functions
Detailed statement of tasks done:
Deliverables:
- Performed Quality Control, Normalization, and Batch Correction on raw data
- Created own metadata from GEO database
- Created heatmaps, GO plots, and KEGG plots
- Made conclusions on cancer
Other:
- Learned more advanced R and its functions
Challenges and how those challenges were overcome:
- Lack of communication with a team was difficult. Worked this out with lots of googling and self-studying
- Some struggles with quality control,. worked issue out with help of Anca
Overview of things learned:
Week 5 (8/12/20 - 8/17/20)
-
Technical - Finalized and presented final project
-
Tools - R Studio, Google Meets, Slack
-
Soft Skills - communication, problem-solving, cooperation, persistence, critical thinking, independence, patience, presentation skills, public speaking
Three achievement highlights:
- Finished project and prepared presentation
- Practice presentation with Sarah. Received feedback
- Presented presentation in front of Yves and Sarah. Received important constructive criticism.
List of meetings attended including social team events:
Attended all meetings except for happy hour and office hours
Goals for the upcoming week:
- Learn more R
- Consider other bioinformatics internships
Detailed statement of tasks done:
Deliverables:
- Using similar bioinformatics pipeline learned in internship, performed similar preprocessing, processing, and analysis with new raw data for lung cancer.
- Made conclusions and compared conclusions with hypothesis
- Set future goals for R
Other:
- Learned more advanced R and its functions
Challenges and how those challenges were overcome:
- Performed extremely poorly on presentation. Will work on public speaking skills and R skills
- Time management. Finding a time to present and practice was difficult. Will work on time management in future.
Link: Final Project - Google Slides