Took on a task lead role with little experience with the technical aspect of the project
Was not hesitant about using threads to ask questions
Able to improve and understand information regarding R programming as well as the project’s scientific background through the use of training videos and webinars
List of meetings attended including social team events:
Introductory webinar (live)
7/20 Deliverables Meeting (live)
Biology Webinar (recording)
Goals for the upcoming week:
Attend this week’s office hours and ask more questions
Connect with team members more and coordinate schedule plans with them
Improve on using R and Github to complete and share tasks correctly and efficiently
Detailed statement of tasks done:
Made a schedule/guide for the team, highlighting when deliverables should be due
Scheduled google meetings with team members
Data curation and pre-processing
Quality control- Simpleaffy and ArrayQualityMetrics
Normalization- mas5 & log 2: Boxplot before and after normalization
Batch effect correction- Combat
Visualization- heatmaps before and after batch correction
Created a deliverable overview document containing created code and visuals
Presented all data and work during Tuesday’s meeting
Challenges:
Communicating and connecting with team members
Small details related to visualizing data such as labeling
Deciding which outliers should be removed and when to remove them (after normalization)
List of meetings attended including social team events:
7/27 Deliverables Meeting (live)
Office Hours (live)
Goals for the upcoming week:
Attend happy hour and connect with new assigned team as well as other STEM-Away participants
Collaborate and communicate with team members regarding deliverable progress and team deadlines
Attend office hours and continue asking questions in order to work more efficiently
Do as much background research needed to comprehend the project
Detailed statement of tasks done:
Deliverables:
Annotations using hgu133plus2.db to map probe IDs with gene symbols
Refined expression set data by omitting rows with duplicates (gene symbols and probe IDs) and missing data using collapseRows and na.omit function
Researched tidyverse package and used it to edit refined expression set data
Filtered out genes below 2nd centile of expression distribution of dataset
Analyzed data with limma package and sorted top DEGs using topTable function
Generated and edited heatmap and volcano plot of DEGs
Created and delivered a presentation of all work completed
Challenges:
Technical challenges:
Adjusting expression set data in order to run the collapseRows function. Due to the addition of a character column (gene symbols) to the expression set, all other columns were converted into a character data type which had to be covered back into a numeric data type. This was resolved after viewing STEM-Away’s troubleshooting thread.
Not being able to run the lmFit function due to issues with expression set data. This was resolved after researching potential issues and discovering that by using the tidyverse package and renaming the rownames of the input data, the lmFit function could be run.
Workflow challenges:
Not being able to reach other group members to discuss progress and results.
Technical - Gene ontology analysis and visualization, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, visualizing gene-concept networks, gene set enrichment analysis with hallmark gene sets
Tools - R Studio, Slack, Google Meet
Soft Skills - team communication, research
Three achievement highlights:
Communicated and collaborated with team members and project leads to overcome group technical issues
Conducted research on the side to comprehend the biological background of assigned tasks
Improved on identifying and understanding issues with own code
List of meetings attended including social team events:
8/3 Deliverables Meeting (live)
Goals for the upcoming week:
Attend happy hour and connect with other STEM-Away participants
Attend office hours and continue asking questions in order to work more efficiently
Develop a strategy to complete the final project and to create a thorough presentation of accomplished work
Do as much background research required to understand the assigned individual project
Detailed statement of tasks done:
Defined significant DEGs into a vector and converted gene symbols to their extrez IDs using the org.Hs.eg.db database
Visualized DEGs after enrichGO analysis using barplots
Visualized DEGs after enrichKEGG pathway analysis using dotplots
Observed complex associations between genes using gene-concept networks
Conducted global gene set enirchment analysis using hallmark gene sets
Challenges:
Technical challenges:
Knowing where to begin in order to create gene vectors of up and down regulated DEGs. This was overcome by substituting the vector with a column of a created matrix. The gene vectors were eventually generated after consulting our team technical lead.
Errors were generated after running the GSEA function with the matrix column and by substituting the column with an actualy gene vector and by adjusting p value cutoffs, enriched terms were identified. This allowed for the gseaplot2 to run and create a graph with visible peaks.
Workflow challenges:
Although the team did not break up the work, it was rewarding to figure out certain parts of code on my own. Team members were very helpful and contributed to a constructive workflow.
Conducted background research on data used for functional analysis
Compared gene expression between interstitial lung disease cells and normal cells and drew conclusions based on the molecular functions of the differecially expressed genes
Created and delivered a 15 minute final presentation which included an overview of all steps taken to receive and discuss results.
List of meetings attended including social team events:
8/10 Team Meeting (live)
8/11 Office Hours: Functional Analysis
8/12 Functional Analysis Webinar
8/14 Happy Hour
8/14 Group Presentations
Webinar: How to Make a Professional Presentation
*8/17 Team Meeting (live)
*8/19 Office Hours- Final Presentations
8/21 Final Presentations
Goals for the upcoming week:
Prepare and begin my fall undergraduate semester
Look into STEM-away’s fall session and other opportunities related to data analysis
Connect with people in the bioinformatics pathway
Detailed statement of tasks done:
Before Final Presentation:
Helped team members with issues regarding their code and presented the STRING db section of the group presentation on 8/14.
For Final Presentation:
After conducting background research on what interstitial lung diseases are, microarray data from the GEO website link was downloaded. The data’s series matrix was also downloaded and some of its sections were converted into metadata.
Quality control, background correction, and normalization was used on the downloaded data through R. A QC stat report, an arrayQualitymetrics report, two boxplots, and a heatmap were generated.
Annotations, gene filtering, and limma analysis was used to find the top DEGs which were later visualized with a heatmap and volcano plot.
Functional analysis was conducted on the DEGs, generating an upregulated gene enrichGO bar graph with MF ontology, an upregulated groupGO bar graph with MF ontology, a downregulated groupGO bar graph with MF ontology, a gene concept network/ cnet plot, and gene set enrichment analysis graph.
All generated reports and graphs were presented and discussed in a 15 minute presentation.
Challenges:
Technical challenges:
When it came to the functional analysis section of the final project, there was very few enriched terms to work with. Furthermore, differences in gene expression between ILD cells and normal cells almost seemed insignificant. However, this was overcome by doing extensive research on the molecular functions where upregulated and dowregulated genes differed.
Workflow challenges:
Other than having better time management, there were no other workflow challenges. The independent final project was delivered in a timely manner and well presented