Kevin Lin- BI Pathway- Self Assessment

Self Assessment for Week of 7/20

Overview of things learned:

  • Technical- Quality control, batch correction, visualization of data
  • Tools- R Script/Studio, Slack, Google Meets, Google Documents, Excel
  • Soft Skills- Leadership, communication, critical thinking

Three achievement highlights:

  • Took on a task lead role with little experience with the technical aspect of the project
  • Was not hesitant about using threads to ask questions
  • Able to improve and understand information regarding R programming as well as the project’s scientific background through the use of training videos and webinars

List of meetings attended including social team events:

  • Introductory webinar (live)
  • 7/20 Deliverables Meeting (live)
  • Biology Webinar (recording)

Goals for the upcoming week:

  • Attend this week’s office hours and ask more questions
  • Connect with team members more and coordinate schedule plans with them
  • Improve on using R and Github to complete and share tasks correctly and efficiently

Detailed statement of tasks done:

  • Made a schedule/guide for the team, highlighting when deliverables should be due
  • Scheduled google meetings with team members
  • Data curation and pre-processing
  • Quality control- Simpleaffy and ArrayQualityMetrics
  • Normalization- mas5 & log 2: Boxplot before and after normalization
  • Batch effect correction- Combat
  • Visualization- heatmaps before and after batch correction
  • Created a deliverable overview document containing created code and visuals
  • Presented all data and work during Tuesday’s meeting

Challenges:

  • Communicating and connecting with team members
  • Small details related to visualizing data such as labeling
  • Deciding which outliers should be removed and when to remove them (after normalization)
2 Likes

Self Assessment for Week 7/27

Overview of things learned:

  • Technical - Annotations using hgu133plus2.db, gene filtering, analysis with limma, visualization of data using volcano plots
  • Tools - R Studio, Slack, Google Meet, Github
  • Soft Skills - problem solving (doing own research), communicating concerns

Three achievement highlights:

  • Doing additional research to understand the biological background of the project
  • Presented deliverables step-by-step and edited visuals for better presentation comprehesion
  • Utilized many resources provided to me (Slack, team lead emails, and STEM-Away threads) when I had issues and inquiries.

Presentation (7/27): https://docs.google.com/presentation/d/1UCxnTbNjXiRrsx-uyrLl9IauR03akoJjsbQRrin-ofQ/edit?usp=sharing

List of meetings attended including social team events:

  • 7/27 Deliverables Meeting (live)
  • Office Hours (live)

Goals for the upcoming week:

  • Attend happy hour and connect with new assigned team as well as other STEM-Away participants
  • Collaborate and communicate with team members regarding deliverable progress and team deadlines
  • Attend office hours and continue asking questions in order to work more efficiently
  • Do as much background research needed to comprehend the project

Detailed statement of tasks done:
Deliverables:

  • Annotations using hgu133plus2.db to map probe IDs with gene symbols
  • Refined expression set data by omitting rows with duplicates (gene symbols and probe IDs) and missing data using collapseRows and na.omit function
  • Researched tidyverse package and used it to edit refined expression set data
  • Filtered out genes below 2nd centile of expression distribution of dataset
  • Analyzed data with limma package and sorted top DEGs using topTable function
  • Generated and edited heatmap and volcano plot of DEGs
  • Created and delivered a presentation of all work completed

Challenges:
Technical challenges:

  • Adjusting expression set data in order to run the collapseRows function. Due to the addition of a character column (gene symbols) to the expression set, all other columns were converted into a character data type which had to be covered back into a numeric data type. This was resolved after viewing STEM-Away’s troubleshooting thread.
  • Not being able to run the lmFit function due to issues with expression set data. This was resolved after researching potential issues and discovering that by using the tidyverse package and renaming the rownames of the input data, the lmFit function could be run.

Workflow challenges:

  • Not being able to reach other group members to discuss progress and results.
2 Likes

Self Assessment for Week 8/3

Overview of things learned:

  • Technical - Gene ontology analysis and visualization, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, visualizing gene-concept networks, gene set enrichment analysis with hallmark gene sets
  • Tools - R Studio, Slack, Google Meet
  • Soft Skills - team communication, research

Three achievement highlights:

  • Communicated and collaborated with team members and project leads to overcome group technical issues
  • Conducted research on the side to comprehend the biological background of assigned tasks
  • Improved on identifying and understanding issues with own code

List of meetings attended including social team events:

  • 8/3 Deliverables Meeting (live)

Goals for the upcoming week:

  • Attend happy hour and connect with other STEM-Away participants
  • Attend office hours and continue asking questions in order to work more efficiently
  • Develop a strategy to complete the final project and to create a thorough presentation of accomplished work
  • Do as much background research required to understand the assigned individual project

Detailed statement of tasks done:

  • Defined significant DEGs into a vector and converted gene symbols to their extrez IDs using the org.Hs.eg.db database
  • Visualized DEGs after enrichGO analysis using barplots
  • Visualized DEGs after enrichKEGG pathway analysis using dotplots
  • Observed complex associations between genes using gene-concept networks
  • Conducted global gene set enirchment analysis using hallmark gene sets

Challenges:
Technical challenges:

  • Knowing where to begin in order to create gene vectors of up and down regulated DEGs. This was overcome by substituting the vector with a column of a created matrix. The gene vectors were eventually generated after consulting our team technical lead.
  • Errors were generated after running the GSEA function with the matrix column and by substituting the column with an actualy gene vector and by adjusting p value cutoffs, enriched terms were identified. This allowed for the gseaplot2 to run and create a graph with visible peaks.

Workflow challenges:

  • Although the team did not break up the work, it was rewarding to figure out certain parts of code on my own. Team members were very helpful and contributed to a constructive workflow.
2 Likes

Self Assessment for Week of 8/10 and 8/17

Overview of things learned:

  • Technical - Reviewed all techinical skills used throughout the intership, analysis of new sample data
  • Tools - R Studio, Slack, Google Meet, Excel
  • Soft Skills - presenting, visual communication, independent research, time management, decision-making

Three achievement highlights:

  • Conducted background research on data used for functional analysis
  • Compared gene expression between interstitial lung disease cells and normal cells and drew conclusions based on the molecular functions of the differecially expressed genes
  • Created and delivered a 15 minute final presentation which included an overview of all steps taken to receive and discuss results.

Presentation (8/14): https://docs.google.com/presentation/d/11XEzHrbZvzKZ7Xh_lLg_ovgyyutSQ2j7LBIzG-uuR2c/edit?usp=sharing

Final Presentation (8/21): https://docs.google.com/presentation/d/11XEzHrbZvzKZ7Xh_lLg_ovgyyutSQ2j7LBIzG-uuR2c/edit?usp=sharing

List of meetings attended including social team events:

  • 8/10 Team Meeting (live)
  • 8/11 Office Hours: Functional Analysis
  • 8/12 Functional Analysis Webinar
  • 8/14 Happy Hour
  • 8/14 Group Presentations
  • Webinar: How to Make a Professional Presentation
    *8/17 Team Meeting (live)
    *8/19 Office Hours- Final Presentations
  • 8/21 Final Presentations

Goals for the upcoming week:

  • Prepare and begin my fall undergraduate semester
  • Look into STEM-away’s fall session and other opportunities related to data analysis
  • Connect with people in the bioinformatics pathway

Detailed statement of tasks done:
Before Final Presentation:

  • Helped team members with issues regarding their code and presented the STRING db section of the group presentation on 8/14.

For Final Presentation:

  • After conducting background research on what interstitial lung diseases are, microarray data from the GEO website link was downloaded. The data’s series matrix was also downloaded and some of its sections were converted into metadata.
  • Quality control, background correction, and normalization was used on the downloaded data through R. A QC stat report, an arrayQualitymetrics report, two boxplots, and a heatmap were generated.
  • Annotations, gene filtering, and limma analysis was used to find the top DEGs which were later visualized with a heatmap and volcano plot.
  • Functional analysis was conducted on the DEGs, generating an upregulated gene enrichGO bar graph with MF ontology, an upregulated groupGO bar graph with MF ontology, a downregulated groupGO bar graph with MF ontology, a gene concept network/ cnet plot, and gene set enrichment analysis graph.
  • All generated reports and graphs were presented and discussed in a 15 minute presentation.

Challenges:
Technical challenges:

  • When it came to the functional analysis section of the final project, there was very few enriched terms to work with. Furthermore, differences in gene expression between ILD cells and normal cells almost seemed insignificant. However, this was overcome by doing extensive research on the molecular functions where upregulated and dowregulated genes differed.

Workflow challenges:

  • Other than having better time management, there were no other workflow challenges. The independent final project was delivered in a timely manner and well presented
2 Likes