Final Assessment (Cesar Juarez)
Overview of Things Learned:
Technical Area:
• How to use the R, Rstudio and Jupiter notebook softwares. Used the softwares in order to create tables, graphs, plots, load packages.
• How to read a scientific paper efficiently and with better understanding
• GEO website. Learning how to use the (Analyze in GEO2R), how to download the .cel files and load them into R.
• Quality control using QCReport) function and kenowing how to analyze the different types of graphs and plots.
• Normalization and background correction using RMA
• Visualization: Making a PCA plot using the ggplot function
• Writing a csv file in R
• Using hgu133plus2.db in order to get the gene symbols of the gene IDs
• Filtered out the genes under the 4% using the quantile function
• Analyzed data with limma
• Creating a volcano plot and analyzed the data
• Creating a gene vector
• Doing Gene Ontology (GO) analysis, making dotplots, barplots in order to visualize where the differentially expressed genes are located in the cell, used the enrichGO function
• Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis
• Wikipathways analysis
• DAVID
• STRING
• Phenotypic Data analysis
•
Tools:
The tools that I used during the internship was the STEM-Away platform (forums, individual messaging), Google Drive/Calendar, Asana, Github, R, RStudio, Python, GEO database, Paper (Guo, Li et al. “Construction and Analysis of a ceRNA Network Reveals Potential Prognostic Markers in Colorectal Cancer.” Frontiers in genetics vol. 11 418. 8 May. 2020, doi:10.3389/fgene.2020.00418)
Soft Skills:
• Communication with teams and subteams
• Time-Mangement
• Organization
• Teamwork
Achievement Highlights:
- Coded in R such a doing the deliverable and understanding what each step is for
- Finished the deliverables on time on my own then got together with group to talk about overall results
- Completed the R exercises and went over the trainings on my own to understand the steps
- Doing self-assessments
List of meetings / training attended including social team events:
• Technical Training Webinars (6/1, 6/3, 6/10, 6/17)
• R Trainings (6/2, 6/5,6/9, 6/12, 6/26)
• Python Trainings (6/15, 6/16)
• Gene Team Meetings (6/1, 6/8, 6/11, 6/15, 6/18, 6/25, 6/29, 7/2, 7/8, 7/13, 7/15, 7/21)
• Logistical Webinar (6/10)
• Welcome Session on Leadership and Program Management by Stephanie and Katie (6/10)
• Asana Training (6/16)
• Fireside Chat “Your First Day Isn’t Your First Day – Starting on your first Data Science Team” by Jennifer (6/24)
• Fireshide Chat with Alex Liang (6/30)
• Github Webinar (6/25, 7/1)
• Presentation (7/24)
Tasks Completed/Challenges Faced
• Attended and watch the R trainings. It was difficult to be in the meeting and be doing the code at the same time because sometimes I was falling behind and couldn’t keep up so it was easier for me to watch the recoding and pause it while I was doing the trainings.
• Completed the R exercises.
• Read and annotated the scientific paper about prognostic markers for CRC. The paper was sort of difficult to understand but with reading a few times, going to the technical training and debriefing the images I got a better understanding of that the pipeline of the paper.
• Completed assigned Python and R exercises
• Loaded GSE8671 dataset into R and make a using the ReadAffy() function.
• Did the QC of the data by using the function QCReport()
• Normalized GSE8761 microarray data to be used for quality control analysis using RMA
• Used the ggplot function in R to create PCA plots to compare clustering of normal and tumor samples before and after normalization
• Used the hgu133plus2.db package to annotate the GSE8761 data set by mapping probe ID to gene symbol and eliminating duplicate values and NAs
• Used quantile() function to identify and filter out genes expressed below the 4th quantile from the GSE8761 data set
• Created a new matrix of normalized and filtered GSE8671 differential expression data
• Used the limma package in R to calculate statistics for GSE8671 (gene symbol, log(2)fold change, p-value, adjusted p-value)
• Investigated different thresholds for determining significance of differentially expressed genes and determined appropriate cutoffs
• Illustrated results from GSE8671 differential analysis and significance cutoffs in a Volcano Plot
• Cleaned the phenotypic data for GSE8671 and transferred into a GSE8671 expressionSet object containing the gene expression data for Gene Team group 6
• Created a vector containing the top differentially expressed genes for GSE8671 and mapped to their entrez ids
• Performed GO analysis to identify the cellular components and molecular functions associated with the most differentially expressed genes
• Analyzed the KEGG pathways of the most differentially expressed genes
• Used enrichR() to further analyze the involved pathways and gene ontologies for the top differentially expressed genes in GSE8671
• Used the STRING database to identify protein interactions associated with the top differentially expressed genes and to create a PPI network map
• GSEA enrichment analysis on the GSE8671 data
• Identified an LFC threshold for differential expression and separated upregulated and downregulated genes in the GSE8671 dataset into two vectors
• Ran GO, KEGG, Wiki Pathway, and STRING analysis on both gene vectors to identify and compare trends between upregulated and downregulated genes
• Analyzed differential expression of the GSE21510 data set using the same method
o Quality control and outlier removal
o normalization
o Gene annotation and gene filtering
o Limma analysis and data visualization
• Compared results from my combined data to the results published in the Guo Paper and confirmed differential expression of hub genes
• Created a Powerpoint presentation showcasing what I have learned and accomplished during my internship at STEM-Away
• Challenges: The biggest challenge for me was learning how to code since I had zero experience with using R, RStudio, Python. I had to look up a lot of the codes and tried to understand them. I also had to reach out to the leads and Yves.
Presentation:
Cesar_Juarez_Final_Deliverables_Presentation.pdf (821.4 KB)