My self-assessments are posted as replies to this thread.
BIOINFORMATICS: LEVEL 1 MODULE 1
Overview:
Technical Area
- Learning how to install packages in R, and learning R syntax.
Tools
- RStudio, R, STEM-Away
Soft Skills
- Problem-Solving, Independence, Time Management
Achievement Highlights
- I installed the correct version of R: v.4.0.0
- I was able to install the correct packages that were listed in the assignment: ggplot2 and Bioconductor
- I was able to understand R syntax from R Basics. Host: Yves Gaetan, Resident R Expert at STEM-Away
Details of Tasks and Hurdles
- I had difficulty with installing R v.4.0.0. By default, I was installing the latest version of R, R v.4.1.0.
- At times I found navigating the STEM-Away site confusing, and I had trouble pinpointing the deliverables for Level 1: Module 1 - Self Assessment and Preparation.
BIOINFORMATICS: LEVEL 1 MODULE 2
1. Overview
- Technical Area
- Accessed metadata from GEODatabase
- Imported metadata into R
- Tools
- R
- GEODatabase
- Google Slides and Slack as resources for help.
- Soft Skills
- Time Management
- Troubleshooting
- Communication
2. Achievement Highlights
- Successfully browsed the GEO Database for chip metadata.
- Organized my local file system according to R customs.
- Successfully imported metadata into R.
3. Details of Tasks and Hurdles
- In the beginning, I found it difficult to determine the deliverables for this module. However, after consulting with @ivanlam27 and @veyssi, I finished my code and prepared my deliverable.
4. Goals for the Upcoming Week
- Participate and contribute in daily check-in meetings.
- Work with @veyssi on our portion of Level 2: Module 3.
- Contribute to Groups B2 and D in the combined Bioinformatics project.
BIOINFORMATICS: LEVEL 1 MODULE 3
1. Overview
- Technical Area
- Created a Normalized and Background-Corrected Boxplot using the dataset, GSE19084.
- Explored differences between MAS(5) and RMA.
- Tools
- R: affy, simpleaffy
- GEODatabase
- GitHub
- Google Slides, Slack, Zoom as resources for help.
- Soft Skills
- Teamwork
- Team Management
- Communication
2. Achievement Highlights
- Used information learned from @anya’s meeting to normalize and background-correct a dataset and to create a boxplot.
- Worked with @veyssi to normalize and background-correct a dataset and to create a boxplot in R.
- Identify outliers from our modified dataset.
- Successfully committed a branch to the team GitHub repository.
3. Details of Tasks and Hurdles
- In the beginning, I struggled to understand what normalization, principal-component analysis, and background correction were and how they were significant to dataset, GSE19084. However, after attending @anya’s meeting and my group’s Tuesday check in, I was able to better understand these concepts.
4. Goals for the Upcoming Week
- Explore the leading ideas to prepare for meetings in Groups B2 and D.
- Work with @ivanlam27 on our portion of Module 4.
BIOINFORMATICS: LEVEL 1 MODULE 4
1. Overview
- Technical Area
- Manipulated data to be used to generate plots.
- Created a Volcano Plot using the dataset, GSE19084.
- Analyzed the top 10 DEGs of the data.]
- Tools
- R: EnhancedVolcano, ggplot2, simpleaffy, tidyverse, limma, Biobase, dplyr
- GEODatabase and GEO2R
- GitHub
- Google Slides, Slack, Zoom as resources for help.
- Soft Skills
- Problem-Solving
- Teamwork
- Team Management
- Communication
2. Achievement Highlights
- Used information learned from @anya’s meeting and from my module partner, @ivanlam27, to generate the code for the volcano plot.
- Compare results to that of the paper and from GEO2R.
- Identify outliers from our modified dataset.
3. Details of Tasks and Hurdles
- In the beginning, I struggled to understand R syntax and the data preprocessing needed to generate the volcano plot. However, after working with @ivanlam27, I was able to better understand these topics.
4. Goals for the Upcoming Week
- Work as @veyssi as Group B2 Project Managers to plan our first group meeting.
- Provide a weekly timeline and expectations in the group.
- Work with @KellyZhang on our portion of Module 5.
BIOINFORMATICS: LEVEL 1 MODULE 5
1. Overview
- Technical Area
- Manipulated data to be used to generate plots.
- Created a GSEA Plot using the dataset, GSE19084.
- Analyze a GSEA Plot.
- Tools
- R: clusterProfiler, GSEA, AnnotationDbi
- GEODatabase and GEO2R
- GitHub
- Google Slides, Slack, Zoom as resources for help.
- Soft Skills
- Problem-Solving
- Teamwork
- Team Management
- Communication
2. Achievement Highlights
- Used information learned from @anya’s meeting and from my module partner, @KellyZhang, to generate the code for our portion of the functional analysis.
- Compare results to that of the paper and from @anya’s Guided Module.
- Identify significant gene sets from the generated GSEA plots.
3. Details of Tasks and Hurdles
- In the beginning, I found it difficult to organize the data from limma analysis into a proper data frame and vector. However, I was able to reform the data and perform the expected GSEA analysis.
- We noticed that at a certain p-value cutoff in the GSEA function, some categories were omitted from the generated tables. After trying various p-value cutoffs, we found that a p-value of 0.15 was sufficient.
4. Goals for the Upcoming Week
- Work as @veyssi as Group B2 Project Managers to organize the rest of group with coding tasks.
- Work with @sanisetti on our portion of Module 6.
BIOINFORMATICS: LEVEL 1 MODULE 6
1. Overview
- Technical Area
- Uploading gene sets to web-based Functional Analysis. tools.
- Analyze generated plots.
- Tools
- R: AnnotationDbi
- EnrichR, DAVID, Metascape
- STRING and PPI
- GEPIA
- Soft Skills
- Research
- Teamwork
- Team Management
- Communication and Organization
2. Achievement Highlights
- Used differentially-expressed gene data generated from Module 5 to generate a protein-protein interaction network and STRING.db table as web-based functional analysis.
- Identify significant protein-protein interactions from the PPI network.
- Identify the highest co-expressed genes in the network.
3. Details of Tasks and Hurdles
- Uploading the gene vector generated from Module 5 to the STRING.db website initially failed. However, changing the output to the generated format generated a plot.
- My module partner, @sanisetti, and I had different gene vectors from our previous work.
4. Goals for the Upcoming Week
- Work with @veyssi as Group B2 Project Managers to code the layout of the R Shiny app with @ivanlam27.
- Work with Team-1 members on functional analysis for the Capstone project.
BIOINFORMATICS: LEVEL 1 MODULE 7
1. Overview
- Technical Area
- Use a transcriptomics pipeline to analyze a new GEO dataset, GSE4107: colorectal cancer. tools.
- Conduct quality control and categorize the dataset.
- Identify differentially-expressed genes and understand underlying pathways, categories, and protein-protein interactions with these DEGs.
- Tools
- GEO and GEO2R
- R: Affy, AnnotationDbi, enrichPlot, limma, clusterProfiler
- QC and PCA
- Heatmap and VolcanoPlot
- KEGG, Gene-Ontology Network, GSEA, Survival analysis
- Soft Skills
- Presentation Skills
- Teamwork
- Team Management
- Communication and Organization
2. Achievement Highlights
- Identified KEGG pathways with the highest differentially-expressed genes.
- Created a gene-concept network for further analysis, and a GSEA plot to target hallmark categories.
- Creating, practicing, and presenting a capstone presentation for GSE4107 with teammates: @Ananya_Kaushik, @veyssi, @ivanlam27, @KellyZhang, @Leila.
3. Details of Tasks and Hurdles
- Our team had to coordinate meetings across multiple, different time-zones.
4. Goals for the Upcoming Week
- Present the R-Shiny product with the R-Shiny team.
- Work with my capstone team on further transcriptomics analysis.
- Possibly meet with @anya’s research group about STEM-Away internship experiences.