Self Assessment Week 2
-
Technical Area
- Learned about DNA micro-array and RNA sequencing
- Learned about R packages as well as the many tools that could be used for data analysis
- Learned about differential gene analysis and data normalization through Rstudio and in generating plots such as Volcano plots and PCA
- Learning and gaining knowledge about key terms within the technical paper including ceRNA networks, lncRNA-miRNA-mRNA interaction, and PPI networks
-
Tools
- Explored R through Rstudio to perform data analysis and graphs
- Explored pandas through Jupyter notebook
- Learned how to use Asana for tracking project and tasks
-
Soft Skills
- Reached out to the leads for guidance and weekly schedule
- Discussed contents of the technical paper within discussion groups improving my comfort with the project and team
-
Achievement Highlights
- Finished the R and Python training exercises which gave me a decent grasp over R given this is my first time using the language
- Learned more about reading Technical papers and figures which has greatly helped in understanding the internship
- Refreshed my python skills as I haven’t used python for a while
-
Meetings Attended:
6/1 Mentorchains/Gsuite setup, 6/2 R Training, 6/3 Bioinformatics Webinar 2, 6/5 R Training 2, 6/6 Python Training, 6/9 Python Training, 6/10 Logistical Webinar, 6/10 Technical Training:Debriefing the Paper, 6/15 Gene Team Meeting, 6/16 Asana Training
- Goals
- Read more about the technical paper as there were a few areas which I am a little bit unclear about
- Become more familiar with R as well as trying to learn more about data visualization and interpretation
- Complete next week’s assignments and attend more team meetings
- Task Completion
Overall I am a bit mixed about how I feel about the project. There was a bit of a weird scheduling issue I’ve had to deal with, which led to my progress being a bit behind but I have in the most part caught up. The R training videos were definitely most appreciated in helping me review and complete the R exercise as well as the webinar recordings in helping me figure out the technical paper.
Final Self Assessment (Victor Jian)
Overview of things I learned:
Technical Skills:
- Learned how to use the Pandas library to create dataframe and extract data in Python as well as using the Matplotlib and Seaborn libraries to create graphs and charts using said data
- Learned how to read a scientific paper properly as well as the strategies in approaching them to get a better understanding of the content
- Learned more about the roles in genetics in cancer as well as the many general components of a ceRNA network
- Learned how to setup and perform a DEG analysis on microarray data using Rstudio and bioconductor packages as well as using GEO2R
- Learned how to perform quality control on dataset through Rstudio to determine outliers and overall quality of data
- Learned how to work on large datasets in Rstudio to sort, extract, and modify data as well as creating matrices and dataframes
- Learned about data normalization and how to create/analyze PCA plots, heatmaps, and volcano plots through ggplot2 and other R packages
- Learned how to perform Pathway analysis on dataset as well as the many online tools and software available
- Learned how to use the STRING database and Cytoscape to create PPI networks and analyze them
- Learned how to use github to upload and organize code
Tools:
- Python, Rstudio, Jupyter Notebook, Stemaway Forum, Asana, Google Suites, Bioconductor, Slack, Github, GEO2R, GEO database, GSE 21510 dataset, EnrichR, Cytoscape, STRING database, and the Paper (Guo, Li et al. “Construction and Analysis of a ceRNA Network Reveals Potential Prognostic Markers in Colorectal Cancer.” Frontiers in genetics vol. 11 418. 8 May. 2020, doi:10.3389/fgene.2020.00418)
Soft Skills:
- Worked with teammates virtually to perform weekly tasks and coordinate meetings
- Communicated with mentors and team leads over email and meetings for clarification or help
- Worked with other teams to coordinate presentations and meetings
- Learned about the importance of creativity in STEM, elevator pitches, networking, presentation strategies, divergent thinking, and resume building
Tasks Completed
- Attended weekly meetings for python and Rstudio training as well as completing related exercises.
- Read and followed the research paper through its research pipeline from DEG analysis to construction of ceRNA network. The paper was rather complex with many new terms and concepts however after attending the weekly meetings, webinars, and working with team mates, I have a decent understanding of the overall process behind it and its concepts.
- Normalized GSE 8671 and 21510 microarray data through Mas5, RMA, and GCRMA techniques
- Performed quality control on datasets using arrayQualityMetrics package on normalized and raw datasets
- Used ggplot2, pheatmap, and EnhancedVolcano packages to create PCA plots, heatmaps and Volcano plots on normalized and raw data
- Used hgu133plus2.db package to annotate dataset by mapping probeIDs to gene symbols
- Used quantile() function to filter out genes expressed below the 4th quantile for datasets
- Used limma package to perform DEG analysis and calculate logFC, p value, and adjusted p values for filtered datasets
- GEO2R was also used for this
- Investigated different thresholds for determining significance of differentially expressed genes and determined appropriate cutoffs
- Created volcano plot on the DEG analyzed dataset using different significance cutoffs and thresholds
- Extracted phenotypic data from datasets and transferred to expressionSet object of the respective datasets
- Created vector of top differentially expressed genes and mapped their entrez IDs
- Performed GO analysis, KEGG analysis, and EnrichGO analysis to determine the pathways, gene ontology, and molecular functions of upregulated and downregulated DEG
- EnrichR was also used for this as well
- Used STRING database and Cytoscape to construct PPI network of the top DEG and analyze them
- Created powerpoint highlighting achievements and work throughout internship at STEM-Away
Achievement Highlights
- First time using Rstudio, however I now have a decent grasp on using it to create plots, run data through functions and packages, extracting data, and import/exporting data
- Completed each weekly deliverables and training exercises
- Learned how to Cytoscape to perform PPI network analysis and used it on the GSE 21510 dataset
- Learned about the process of determining prognostic biomarkers for cancer as described by the research paper.
Presentation:
Part1 of StemAway_Final_Deliverables_Victor_Jian (1).pdf (1.5 MB)
Part2 of Copy of StemAway_Final_Deliverables_Victor_Jian.pdf (3.4 MB)
Sorry for the file separation, the original file is too large to be uploaded in one part.
Link below contains file in one part:
https://drive.google.com/file/d/17mpxEI_LKhLA50lCTSa5qc7gaWT9MNXX/view?usp=sharing