Cesar_Samuel_Juarez - Bioinformatics Pathway

Week of 6/17-6/22

Things Learned:

Technical Area:
I was learning how to use Asana, and be updated on if new tasks are assigned. I also worked on week 3 deliverables and it was difficult since I don’t have experience in R but I was able to look online and also ask questions.

Tools:
The tools that I used during the internship is the STEM-Away platform (forums, gene team forum), Google, Slack, R script, R Studio, Asana, GEO database. More specifically in R, R Studio used .cel, AffyQCReport, RMA (normalization), PCA plots

Soft Skills:
The main soft skills that I feel like I have enhanced during my time at STEM-Away are communication, time management, working in teams. I have enhanced my skill in communication by talking with the members of the project in team meetings and forums. I also have enhanced my skills in time management; since I am working and doing the internship, I have been able to find time in order to be in the Trainings, Team Meeting, Webinars. I understood that everyone lived in different time zones so being able to communicate with my sub-team was critical in everyone understanding what we needed to do.

Achievement Highlights:

  1. Worked on week 3 deliverables and completed the tasks
  2. Looked online to see if I can solve problems related to R code and if not then reached out to get help.
  3. Saw how Asana works

List of meetings / training attended including social team events:
• Technical Training Webinar (6/17/20)
• Gene Team meeting (6/18/20)
• Gene Team: Group 6 meeting (6/18/20)

Goals for upcoming week:
• Go over our presentation for week 3 deliverables by comparing our results with Team 7
• Read over week 4 deliverables and start on them
• Communicate more with my team if I have questions about the deliverables

Tasks Completed/Challenges Faced
• Normalized microarray data
• Created a CSV file using the exprs() and write.csv() functions I R
• Visualized the analysis report from the QC method I used (affyQCReport), Challenge Faced: Had trouble getting the pdf file and reached out to Annie to solve the problem, and it was solved
• Made a PCA plot, Challenge Faced: I had trouble with getting the percent variability and reached out to Annie to solve the problem

Week of 6/24-6/30 (Week 4)
Things Learned:

Technical Area:
I learned how to work in a team to do the week 4 deliverable. I had a lot of question with doing some of the steps in R but in reaching out to the leads I was able to get a better understanding. I learned how to annotate the data, filter the genes, and analyze the data with limma. I also learned how to make a volcano plot and how to analyze the results that we have gotten from it. I also started to work on getting the phenodata.

Tools:
The tools that I used during the internship is the STEM-Away platform (forums, gene team forum), Google, Slack, R script, R Studio, Asana, GEO database. More specifically in R, R Studio used the library(limma) in order to analyze our data.

Soft Skills:
The main soft skills that I feel like I have enhanced during my time at STEM-Away are communication, time management, working in teams, organization. I have enhanced my skill in communication by talking with the members of the project in team meetings and forums. I also have enhanced my skills in time management; since I am working and doing the internship, I have been able to find time in order to be in the Trainings, Team Meeting, Webinars. I enhanced my organization skills by making a personal folder where I can keep all of the stem away related documents.

Achievement Highlights:

  1. Worked on week 4 deliverables and completed the tasks
  2. Looked online to see if I can solve problems related to R code and if not then reached out to get help.
  3. Saw how Asana works, and leanred how to mark by deliverable to be checked by leads.

List of meetings / training attended including social team events:
• Fireside Chat (6/24/20)
• GitHub Webinar, Webinar to introduce students to Bioinformatics, Gene Team Meeting (6/25/20)
• Gene Team Meeting (6/29/20)

Goals for upcoming week:
• Go over our presentation for week 4 deliverables by comparing our results with Team 4 on limma and how it works, why it is used
• Read over week 5 deliverables and start on them
• Communicate more with my team if I have questions about the deliverables

Tasks Completed/Challenges Faced
• I was able to complete the tasks for the week 4 deliverables
• Annotation (added Gene Symbols) to the ProbsetID’s along with the samples. Challenge: this was really confusing to me because I was thinking we have 54k ProbeIDs and when we remove the duplicated ProbeIDs and Gene Symbols we have less ProbeIDs and I was not sure why that was the case.
• Gene Filtering
• Analysis with limma
o Convert affy data into a matrix
o Write out differentially espression results to files sorted by adjusted p-value
o Create a volcano plot

Week of 7/1-7/8
Things Learned:

Technical Area:
I learned how to create a gene vector out of our csv results that we got from analyzing limma. I now know that the gene ontology analysis is used in order to describe gene functions and relationships between the concepts. I used different visualization methods such as barplot, dotplot and a plotGOgraph. I used two pathway analysis such as KEGG and WikiPathways in order to look at the different biological systems. Then I learned that David is a gene set enrichment analysis tool that is used on the the differentially epressed genes.

Tools:
The tools that I used during the internship is the STEM-Away platform (forums, gene team forum), Google, Slack, R script, R Studio, Asana, GEO database. More specifically in R and R Studio I used the groupGO, barplot, dotplot, enrichKEGG, Wikipathways. I also used David.

Soft Skills:
The main soft skills that I feel like I have enhanced during my time at STEM-Away are communication, time management, working in teams, organization. I have enhanced my skill in communication by talking with the members of the project in team meetings and forums. I also have enhanced my skills in time management; since I am working and doing the internship, I have been able to find time in order to be in the Trainings, Team Meeting, Webinars. I enhanced my organization skills by making a personal folder where I can keep all of the stem away related documents.

Achievement Highlights:

  1. Worked on week 5 deliverables and completed the tasks
  2. Looked online to see if I can solve problems related to R code and if not then reached out to get help from Yves.
  3. Made a folder on Github in order to keep our team documents organized.

List of meetings / training attended including social team events:
• Github Webinar #2, Office Hours (7/1/20)
• Gene Team Meeting (7/2/20)
• Gene Team Meeting (7/8/20)

Goals for upcoming week:
• Finalize Team 6’s deliverables by looking them over and fixing anything that needs to be fixed.
• Reach out to team + team 4 in order to work on our presentation and be ready to present on Monday.
• Reach out to Yves if I have any questions on the code for week 5 deliverables

Tasks Completed/Challenges Faced
• I was able to complete the tasks for the week 5 deliverables
• Created a vector from our results from week 4
• Gene Ontology Analysis where differentially expressed genes are located in the cell
• KEGG analysis in order to visualize the high-level functions and utilities of the biological system
• WikiPathways which is a database of biological pathways which allowed us to see which pathways our genes had.
• David

Week of 7/9-7/15
Things Learned:

Technical Area:
Started to work on the presentation using google slides. Being present during the team meeting so we all know what we have to say.

Tools:
The tools that I used during the this week is Google Drive/Slides, STEM-Away forums.

Soft Skills:
The main soft skills that I feel like I have enhanced during my time at STEM-Away are communication, time management, working in teams, organization. I have enhanced my skill in communication by talking with the members of the project in team meetings and forums. I also have enhanced my skills in time management; since I am working and doing the internship, I have been able to find time in order to be in the Trainings, Team Meeting, Webinars. I enhanced my organization skills by making a personal folder where I can keep all of the stem away related documents.

Achievement Highlights:

  1. Turned in week 5 deliverable on time
  2. Started working on the presentation

List of meetings / training attended including social team events:
• Gene Team Meeting (7/13/20)
• Gene Team Meeting (7/15/20)

Goals for upcoming week:
• Work on Week 5 deliverables presentation
• Give an in-depth overview of KEGG analysis

Tasks Completed/Challenges Faced
• I was able to complete the tasks for the week 5 deliverables
• Coding was troubling for me, but I was able to get it

Week of 7/16-7/22
Things Learned:

Technical Area:
I learned how to work on the final deliverables on my own, without reaching out to the mentors or leads. I was able to get a hang of R and RStudio and feel more comfortable using the programs.

Tools:
The tools that I used during the internship is the STEM-Away platform (forums, gene team forum), Google, Slack, R script, R Studio, Asana, GEO database. In RStudio and R I used all of the tools that I used in previous weeks in order to make the tables, graphs, plots for the final deliverables.

Soft Skills:
The main soft skills that I feel like I have enhanced during my time at STEM-Away are communication, time management, working in teams, organization. I have enhanced my skill in communication by talking with the members of the project in team meetings and forums. I also have enhanced my skills in time management; since I am working and doing the internship, I have been able to find time in order to be in the Trainings, Team Meeting, Webinars. I enhanced my organization skills by making a personal folder where I can keep all of the stem away related documents.

Achievement Highlights:

  1. Worked on the final deliverables on my own without reaching out
  2. Made my presentation and turned it in on GitHub

List of meetings / training attended including social team events:
• Gene Team Meeting (7/21/20)

Goals for upcoming week:
• Go over my presentation and make sure that everything is finalized before presenting
• Practice presentation
• Turn in final deliverables by Sunday

Tasks Completed/Challenges Faced
• I was able to complete the final deliverables on my own
• Created the final deliverables presentation
• I had trouble with making the pheatmap because my graph was showing but I wasn’t able to read the letter on the side
• I also had trouble with figuring out the dataset to use since for GSE21510 it said there was 104 cancer and 44 normal samples but some were homogenized and some weren’t

Final Assessment (Cesar Juarez)
Overview of Things Learned:

Technical Area:
• How to use the R, Rstudio and Jupiter notebook softwares. Used the softwares in order to create tables, graphs, plots, load packages.
• How to read a scientific paper efficiently and with better understanding
• GEO website. Learning how to use the (Analyze in GEO2R), how to download the .cel files and load them into R.
• Quality control using QCReport) function and kenowing how to analyze the different types of graphs and plots.
• Normalization and background correction using RMA
• Visualization: Making a PCA plot using the ggplot function
• Writing a csv file in R
• Using hgu133plus2.db in order to get the gene symbols of the gene IDs
• Filtered out the genes under the 4% using the quantile function
• Analyzed data with limma
• Creating a volcano plot and analyzed the data
• Creating a gene vector
• Doing Gene Ontology (GO) analysis, making dotplots, barplots in order to visualize where the differentially expressed genes are located in the cell, used the enrichGO function
• Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis
• Wikipathways analysis
• DAVID
• STRING
• Phenotypic Data analysis

Tools:
The tools that I used during the internship was the STEM-Away platform (forums, individual messaging), Google Drive/Calendar, Asana, Github, R, RStudio, Python, GEO database, Paper (Guo, Li et al. “Construction and Analysis of a ceRNA Network Reveals Potential Prognostic Markers in Colorectal Cancer.” Frontiers in genetics vol. 11 418. 8 May. 2020, doi:10.3389/fgene.2020.00418)

Soft Skills:
• Communication with teams and subteams
• Time-Mangement
• Organization
• Teamwork

Achievement Highlights:

  1. Coded in R such a doing the deliverable and understanding what each step is for
  2. Finished the deliverables on time on my own then got together with group to talk about overall results
  3. Completed the R exercises and went over the trainings on my own to understand the steps
  4. Doing self-assessments

List of meetings / training attended including social team events:
• Technical Training Webinars (6/1, 6/3, 6/10, 6/17)
• R Trainings (6/2, 6/5,6/9, 6/12, 6/26)
• Python Trainings (6/15, 6/16)
• Gene Team Meetings (6/1, 6/8, 6/11, 6/15, 6/18, 6/25, 6/29, 7/2, 7/8, 7/13, 7/15, 7/21)
• Logistical Webinar (6/10)
• Welcome Session on Leadership and Program Management by Stephanie and Katie (6/10)
• Asana Training (6/16)
• Fireside Chat “Your First Day Isn’t Your First Day – Starting on your first Data Science Team” by Jennifer (6/24)
• Fireshide Chat with Alex Liang (6/30)
• Github Webinar (6/25, 7/1)
• Presentation (7/24)

Tasks Completed/Challenges Faced
• Attended and watch the R trainings. It was difficult to be in the meeting and be doing the code at the same time because sometimes I was falling behind and couldn’t keep up so it was easier for me to watch the recoding and pause it while I was doing the trainings.
• Completed the R exercises.
• Read and annotated the scientific paper about prognostic markers for CRC. The paper was sort of difficult to understand but with reading a few times, going to the technical training and debriefing the images I got a better understanding of that the pipeline of the paper.
• Completed assigned Python and R exercises
• Loaded GSE8671 dataset into R and make a using the ReadAffy() function.
• Did the QC of the data by using the function QCReport()
• Normalized GSE8761 microarray data to be used for quality control analysis using RMA
• Used the ggplot function in R to create PCA plots to compare clustering of normal and tumor samples before and after normalization
• Used the hgu133plus2.db package to annotate the GSE8761 data set by mapping probe ID to gene symbol and eliminating duplicate values and NAs
• Used quantile() function to identify and filter out genes expressed below the 4th quantile from the GSE8761 data set
• Created a new matrix of normalized and filtered GSE8671 differential expression data
• Used the limma package in R to calculate statistics for GSE8671 (gene symbol, log(2)fold change, p-value, adjusted p-value)
• Investigated different thresholds for determining significance of differentially expressed genes and determined appropriate cutoffs
• Illustrated results from GSE8671 differential analysis and significance cutoffs in a Volcano Plot
• Cleaned the phenotypic data for GSE8671 and transferred into a GSE8671 expressionSet object containing the gene expression data for Gene Team group 6
• Created a vector containing the top differentially expressed genes for GSE8671 and mapped to their entrez ids
• Performed GO analysis to identify the cellular components and molecular functions associated with the most differentially expressed genes
• Analyzed the KEGG pathways of the most differentially expressed genes
• Used enrichR() to further analyze the involved pathways and gene ontologies for the top differentially expressed genes in GSE8671
• Used the STRING database to identify protein interactions associated with the top differentially expressed genes and to create a PPI network map
• GSEA enrichment analysis on the GSE8671 data
• Identified an LFC threshold for differential expression and separated upregulated and downregulated genes in the GSE8671 dataset into two vectors
• Ran GO, KEGG, Wiki Pathway, and STRING analysis on both gene vectors to identify and compare trends between upregulated and downregulated genes
• Analyzed differential expression of the GSE21510 data set using the same method
o Quality control and outlier removal
o normalization
o Gene annotation and gene filtering
o Limma analysis and data visualization
• Compared results from my combined data to the results published in the Guo Paper and confirmed differential expression of hub genes
• Created a Powerpoint presentation showcasing what I have learned and accomplished during my internship at STEM-Away
• Challenges: The biggest challenge for me was learning how to code since I had zero experience with using R, RStudio, Python. I had to look up a lot of the codes and tried to understand them. I also had to reach out to the leads and Yves.

Presentation:
Cesar_Juarez_Final_Deliverables_Presentation.pdf (821.4 KB)