Aliciarepka - Bioinformatics Pathway

aliciarepka · June 15, 2020, 6:54pm

Things Learned:

Technical Skills: Creating functions in Python, Data analysis and plotting in R, Strategies for analyzing scientific papers, Role of genetics and the environment in CRC and other cancers
Tools: Python, R, Stem-Away Forum, Google Suites
Soft Skills: online collaboration and team building, communication with new people, etiquette for online meetings.

Achievement Highlights

Leaned how to edit plots in R to analyze specific data sets
Leaned how to create functions in Python
Analyzed Figures in the CRC paper to better understand the implications of the data

Meetings/Trainings Attended

6/2 R Training, 6/3 Technical Training Webinar, 6/4 Team 3 Meeting, 6/9 R Training, 6/9 Python Training, 6/10 Logistics Webinar, 6/10 Technical Training, 6/10 Welcome Session, 6/11 Gene Team Meeting, 6/12 Welcome Meeting, 6/12 R Training, 6/12 Gene Team Happy Hour, 6/15 Gene Team Meeting

Goals for the Upcoming Week

Continue to work with and become more comfortable with R and Python so that I can better understand how to accomplish a given task
Attend all meetings and events
Discuss the CRC paper and data with my team to gain new insight on the findings

Tasks Completed:

Attended or watched the R and Python training sessions and completed related exercises. I found some of the exercises to be a little challenging, but after going back and watching the training for a second time, I was able to understand how to adapt the code to accomplish the different tasks.
Read and annotated the scientific paper about prognostic markers for CRC. The paper was lengthy and difficult to digest, but after discussing with my team and receiving training on a recommended order to read the paper and the areas to focus on, the task seemed much more manageable and I was able to extract more meaning from the paper.

Current Role: Observer, but I would like to request to change to a participant now that our project duration has been extended to 8 weeks.

aliciarepka · June 30, 2020, 1:23am

Self Assessment 6/22:
Things Learned:

Technical Skills: Creating functions in Python, Data analysis and plotting in R, Strategies for analyzing scientific papers, Role of genetics and the environment in CRC and other cancers, Data visualization packages in Python, Bioconductor software for high-throughput genomic data analysis in R
Tools: Python, R, Stem-Away Forum, Google Suites, Bioconductor, Slack
Soft Skills: online collaboration and team building, communication with new people, etiquette for online meetings. Leading small groups, organizing online meetings

Achievement Highlights:

Identified and taught peers a simple and efficient method to read in microarray data for analysis project
Collaborated with peers to analyze data on GSE8671 expression and coordinate a presentation
Troubleshooted in R to develop a method to fix the probe ID labels in the normalized GSE8761 data

Meetings/Trainings Attended:

6/2 R Training, 6/3 Technical Training Webinar, 6/4 Team 3 Meeting, 6/9 R Training, 6/9 Python Training, 6/10 Logistics Webinar, 6/10 Technical Training, 6/10 Welcome Session, 6/11 Gene Team Meeting, 6/12 Welcome Meeting, 6/12 R Training, 6/12 Gene Team Happy Hour, 6/15 Gene Team Meeting 6/15 Python Office Hours, 6/16 Asana Training, 6/16 Python and Pandas Webinar, 6/17 Technical Training, 6/18 Gene Team Meeting, 6/18 Gene Team Office Hours, 6/19 Gene Team Happy Hour, 6/22 Gene Team Meeting, 6/23 Python Training, 6/24 Bioinformatics Office Hours, 6/24 Fireside Chat Webinar

Goals for the Upcoming Week:

Ask clarifying questions using the STEM Away Forum
Finish tasks and deliverables ahead of time
Collaborate with my small group to better understand the deliverables and what the code is accomplishing

Tasks Completed:

Attended or watched the R and Python training sessions and completed related exercises. I found some of the exercises to be a little challenging, but after going back and watching the training for a second time, I was able to understand how to adapt the code to accomplish the different tasks.
Read and annotated the scientific paper about prognostic markers for CRC. The paper was lengthy and difficult to digest, but after discussing with my team and receiving training on a recommended order to read the paper and the areas to focus on, the task seemed much more manageable and I was able to extract more meaning from the paper.
Completed assigned Python and R exercises
Normalized GSE8761 microarray data to be used for quality control analysis (mas5 function)
Performed quality control analysis on GSE8761 microarray data using the arrayQualityMetrics() package in Bioconductor for both raw and normalized data
Used the ggplot function in R to create PCA plots and heatmaps to compare clustering of normal and tumor samples before and after normalization
Investigated and used the hgu133plus2.db package to annotate the GSE8761 data set by mapping probe ID to gene symbol

aliciarepka · June 30, 2020, 1:23am

Self Assessment 6/29
Things Learned:

Technical Skills: transposing matrices and data frames in R, working with large data sets in R, changing object types in R, adapting existing code for new purposes
Tools: hgu133plus2.db package in R, STEM-Away Forums, GitHub, limma package in R, GEO2R database, Asana
Soft Skills: flexibility with last-minute meetings, communicating with small groups when not everyone can meet at the same time, the importance of creativity in STEM and creativity-building exercises

Achievement Highlights:

Successfully and independently completed my code not only to finish all of this week’s deliverables on time, but more importantly, to gain a better understanding of how the different code elements work.
Discovered a method to help prevent my computer from crashing when performing calculations of large data sets in R
Mentored my peers in The Gene Team subteam 3 to help them complete their code to accomplish and understand this week’s deliverables as well

Meetings/Trainings Attended:

6/22 Gene Team Meeting, 6/23 Python Training, 6/24 Bioinformatics Office Hours, 6/24 Fireside Chat Webinar, 6/25 GitHub Training Webinar, 6/25 Bioinformatics Webinar, 6/25 Gene Team Meeting, 6/26 R Training, 6/26 Gene Team Happy Hour, 6/29 Gene Team meeting

Goals for the Upcoming Week:

Communicate more with small groups about deliverables
Set expectations and duties with small group before beginning on tasks
Clearer delegation of tasks for subteam 3

Tasks Completed:

Used the hgu133plus2.db package to annotate the GSE8761 data set by mapping probe ID to gene symbol and eliminating duplicate values and NAs
Used quantile() function to identify and filter out genes expressed below the 4th quantile from the GSE8761 data set
Created a new matrix of normalized and filtered GSE8671 differential expression data
Used the limma package in R to calculate statistics for GSE8671 (gene symbol, log(2)fold change, p-value, adjusted p-value)
Investigated different thresholds for determining significance of differentially expressed genes and determined appropriate cutoffs
Illustrated results from GSE8671 differential analysis and significance cutoffs in a Volcano Plot
Completed the second set of python exercises

aliciarepka · July 7, 2020, 1:57pm

Self Assessment 7/6
Things Learned:

Technical Skills: installing new packages in R studio, reading R documentation
Tools: EnrichR, orgs.HS.eg.db, topGO, clusterProfiler, pathview, matgrittr, tidyr, STRING database
Soft Skills: communication, leadership, planning meetings

Meeting/Trainings Attended:

6/29 Python Training, 6/30 Python Office Hours, 7/1 GitHub Webinar, 7/2 Gene Team Meeting, 7/2 Gene Team Happy Hour, 7/6 Gene Team Meeting

Achievement Highlights:

Took the lead in cleaning the phenotypic data for GSE8671 and transferring it into a GSE8671 expressionSet object containing the gene expression data for Gene Team group 3
Asked to host office hours for interns in the July Bioinformatics pathway
Prepared, finalized, and submitted the week 4 results for group 3

Goals for the Upcoming Week:

Thoroughly prepare for and lead a helpful office hours session
Work more closely with team 3 to complete week 5 deliverables

Tasks Completed:

Cleaned the phenotypic data for GSE8671 and transferred into a GSE8671 expressionSet object containing the gene expression data for Gene Team group 3
Created a vector containing the top differentially expressed genes for GSE8671 and mapped to their entrez ids
Performed GO analysis to identify the cellular components and molecular functions associated with the most differentially expressed genes
Analyzed the KEGG pathways of the most differentially expressed genes
Used enrichR() to further analyze the involved pathways and gene ontologies for the top differentially expressed genes in GSE8671
Used the STRING database to identify protein interactions associated with the top differentially expressed genes and to create a PPI network map

aliciarepka · July 13, 2020, 11:18pm

Self Assessment 7/13
Things Learned:

Technical Skills: troubleshooting in R, uploading and organizing files in GitHub
Tools: EnrichR, STRING database
Soft Skills: resume building and formatting, elevator pitches, divergent thinking

Meetings Attended:

7/08 Gene Team meeting, 7/08 New and Old Leads Meeting, 7/09 Python Training, 7/10 Gene Team Happy Hour, 7/13 Gene Team Meeting

Achievement Highlights:

Selected as a lead for the July 2020 BI pathway
Complimented on the helpfulness of my office hours section
Identified and solved a discrepancy problem in my team’s GO and KEGG analysis data

Goals for the Upcoming Week:

Stay on top of tasks while away on vacation
Get ready to serve as a lead for the July session

Tasks Completed:

GSEA enrichment analysis on the GSE8671 data
Hosted two sessions of office hours (7/08 and 7/13)
Identified an LFC threshold for differential expression and separated upregulated and downregulated genes in the GSE8671 dataset into two vectors
Ran GO, KEGG, Wiki Pathway, and STRING analysis on both gene vectors to identify and compare trends between upregulated and downregulated genes

aliciarepka · July 21, 2020, 10:32pm

Self Assessment 7/20
Things Learned:

Technical Skills: troubleshooting in R, installing R packages, project management
Tools: GSE21510 dataset, STEM-Away forum
Soft Skills: Presentation strategies, leading meetings, networking

Meetings Attended:

7/15 Gene Team Team Meeting, 7/15 BI Intro Webinar, 7/16 Leads Meeting, 7/17 July

Leads Meeting, 7/20 July BI Team 3 Meeting, 7/21 BI Leads Technical Training, 7/21

Gene Team Meeting

Achievement Highlights:

Identified and solved a complex error loading packages in R Studio
Complimented on my presentation skills hosting a webinar about the first set of deliverables for incoming July interns
Hosted a successful meeting for July BI team 3 after finding out on the same day that our team lacked a PM lead.

Goals for the Upcoming Week:

Stay on top of tasks while away on vacation
Stay organized with July project lead role as I finish my final project for June
Prepare a great final presentation

Tasks Completed:

Hosted a webinar covering the deliverables for week 3 of the July session
Organized my first team meeting as a lead for the July session
Completed all deliverables for July week 3
- Merged GSE32323 and GSE8671
- Quality control
- Normalization
- Batch correction
Analyzed differential expression of the GSE21510 data set
- Quality control and outlier removal
- normalization
- Gene annotation and gene filtering
- Limma analysis and data visualization
Compared results from GSE21510 to results from GSE8671 to identify common dysregulated genes and associated pathways
Compared results from my combined data to the results published in the Guo Paper and confirmed differential expression of hub genes
Created a Powerpoint presentation showcasing what I have learned and accomplished during my internship at STEM-Away

aliciarepka · July 25, 2020, 7:55pm

Final Self-Assessment

Things Learned:

Technical Skills: Creating functions in Python, Data analysis and plotting in R, Strategies for analyzing scientific papers, Role of genetics and the environment in CRC and other cancers, Data visualization packages in Python, Bioconductor software for high-throughput genomic data analysis in R, transposing matrices and data frames in R, working with large data sets in R, changing object types in R, adapting existing code for new purposes, installing new packages in R studio, reading R documentation, uploading and organizing files in GitHub, troubleshooting in R, project management
Tools: Python, R, Stem-Away Forum, Google Suites, Bioconductor, Slack, hgu133plus2.db package in R, STEM-Away Forums, GitHub, limma package in R, GEO2R database, AsanaEnrichR, orgs.HS.eg.db, topGO, clusterProfiler, pathview, matgrittr, tidyr, STRING databaseGSE21510 dataset, STEM-Away forum
Soft Skills: online collaboration and team building, communication with new people, etiquette for online meetings, leading small groups, organizing online meetings, flexibility with last-minute meetings, communicating with small groups when not everyone can meet at the same time, the importance of creativity in STEM and creativity-building exercises, communication, leadership, planning meetings, resume building and formatting, elevator pitches, divergent thinking, presentation strategies, leading meetings, networking

Meetings/Trainings Attended:

6/2 R Training, 6/3 Technical Training Webinar, 6/4 Team 3 Meeting, 6/9 R Training, 6/9 Python Training, 6/10 Logistics Webinar, 6/10 Technical Training, 6/10 Welcome Session, 6/11 Gene Team Meeting, 6/12 Welcome Meeting, 6/12 R Training, 6/12 Gene Team Happy Hour, 6/15 Gene Team Meeting 6/15 Python Office Hours, 6/16 Asana Training, 6/16 Python and Pandas Webinar, 6/17 Technical Training, 6/18 Gene Team Meeting, 6/18 Gene Team Office Hours, 6/19 Gene Team Happy Hour, 6/22 Gene Team Meeting, 6/23 Python Training, 6/24 Bioinformatics Office Hours, 6/24 Fireside Chat Webinar, 6/25 GitHub Training Webinar, 6/25 Bioinformatics Webinar, 6/25 Gene Team Meeting, 6/26 R Training, 6/26 Gene Team Happy Hour, 6/29 Gene Team meeting 6/29 Python Training, 6/30 Python Office Hours, 7/1 GitHub Webinar, 7/2 Gene Team Meeting, 7/2 Gene Team Happy Hour, 7/6 Gene Team Meeting 7/08 Gene Team meeting, 7/08 New and Old Leads Meeting, 7/09 Python Training, 7/10 Gene Team Happy Hour, 7/13 Gene Team Meeting, 7/15 Gene Team Team Meeting, 7/15 BI Intro Webinar, 7/16 Leads Meeting, 7/17 July Leads Meeting, 7/20 July BI Team 3 Meeting, 7/21 BI Leads Technical Training, 7/21 Gene Team Meeting, 7/24 BI final presentations

Tasks Completed:

Attended or watched the R and Python training sessions and completed related exercises. I found some of the exercises to be a little challenging, but after going back and watching the training for a second time, I was able to understand how to adapt the code to accomplish the different tasks.
Read and annotated the scientific paper about prognostic markers for CRC. The paper was lengthy and difficult to digest, but after discussing with my team and receiving training on a recommended order to read the paper and the areas to focus on, the task seemed much more manageable and I was able to extract more meaning from the paper.
Completed assigned Python and R exercises
Normalized GSE8761 microarray data to be used for quality control analysis (mas5 function)
Performed quality control analysis on GSE8761 microarray data using the arrayQualityMetrics() package in Bioconductor for both raw and normalized data
Used the ggplot function in R to create PCA plots and heatmaps to compare clustering of normal and tumor samples before and after normalization
Used the hgu133plus2.db package to annotate the GSE8761 data set by mapping probe ID to gene symbol and eliminating duplicate values and NAs
Used quantile() function to identify and filter out genes expressed below the 4th quantile from the GSE8761 data set
Created a new matrix of normalized and filtered GSE8671 differential expression data
Used the limma package in R to calculate statistics for GSE8671 (gene symbol, log(2)fold change, p-value, adjusted p-value)
Investigated different thresholds for determining significance of differentially expressed genes and determined appropriate cutoffs
Illustrated results from GSE8671 differential analysis and significance cutoffs in a Volcano Plot
Completed the second set of python exercises
Cleaned the phenotypic data for GSE8671 and transferred into a GSE8671 expressionSet object containing the gene expression data for Gene Team group 3
Created a vector containing the top differentially expressed genes for GSE8671 and mapped to their entrez ids
Performed GO analysis to identify the cellular components and molecular functions associated with the most differentially expressed genes
Analyzed the KEGG pathways of the most differentially expressed genes
Used enrichR() to further analyze the involved pathways and gene ontologies for the top differentially expressed genes in GSE8671
Used the STRING database to identify protein interactions associated with the top differentially expressed genes and to create a PPI network map
GSEA enrichment analysis on the GSE8671 data
Hosted two sessions of office hours (7/08 and 7/13)
Identified an LFC threshold for differential expression and separated upregulated and downregulated genes in the GSE8671 dataset into two vectors
Ran GO, KEGG, Wiki Pathway, and STRING analysis on both gene vectors to identify and compare trends between upregulated and downregulated genes
Analyzed differential expression of the GSE21510 data set
- Quality control and outlier removal
- normalization
- Gene annotation and gene filtering
- Limma analysis and data visualization
Compared results from GSE21510 to results from GSE8671 to identify common dysregulated genes and associated pathways
Compared results from my combined data to the results published in the Guo Paper and confirmed differential expression of hub genes
Created a Powerpoint presentation showcasing what I have learned and accomplished during my internship at STEM-Away

Achievement Highlights:

Leaned how to edit plots in R to analyze specific data sets
Leaned how to create functions in Python
Identified and taught peers a simple and efficient method to read in microarray data for analysis project
Troubleshooted in R to develop a method to fix the probe ID labels in the normalized GSE8761 data
Successfully and independently completed my code not only to finish all of this week’s deliverables on time, but more importantly, to gain a better understanding of how the different code elements work.
Took the lead in cleaning the phenotypic data for GSE8671 and transferring it into a GSE8671 expressionSet object containing the gene expression data for Gene Team group 3
Selected as a lead for the July 2020 BI pathway
Complimented on the helpfulness of my office hours section (7/08 and 7/13)
Identified and solved a discrepancy problem in my team’s GO and KEGG analysis data
Complimented on my presentation skills hosting a webinar about the first set of deliverables for incoming July interns
Compared the dataset I analyzed for my final project to the dataset we analyzed during the June session of BI and complimented on the originality of my presentation

Link to presentation slides: