Victor_Jian - Bioinformatics Pathway

Self Assessment Week 2

  • Technical Area

    • Learned about DNA micro-array and RNA sequencing
    • Learned about R packages as well as the many tools that could be used for data analysis
    • Learned about differential gene analysis and data normalization through Rstudio and in generating plots such as Volcano plots and PCA
    • Learning and gaining knowledge about key terms within the technical paper including ceRNA networks, lncRNA-miRNA-mRNA interaction, and PPI networks
  • Tools

    • Explored R through Rstudio to perform data analysis and graphs
    • Explored pandas through Jupyter notebook
    • Learned how to use Asana for tracking project and tasks
  • Soft Skills

    • Reached out to the leads for guidance and weekly schedule
    • Discussed contents of the technical paper within discussion groups improving my comfort with the project and team
  • Achievement Highlights

    • Finished the R and Python training exercises which gave me a decent grasp over R given this is my first time using the language
    • Learned more about reading Technical papers and figures which has greatly helped in understanding the internship
    • Refreshed my python skills as I haven’t used python for a while
  • Meetings Attended:

6/1 Mentorchains/Gsuite setup, 6/2 R Training, 6/3 Bioinformatics Webinar 2, 6/5 R Training 2, 6/6 Python Training, 6/9 Python Training, 6/10 Logistical Webinar, 6/10 Technical Training:Debriefing the Paper, 6/15 Gene Team Meeting, 6/16 Asana Training

  • Goals
    • Read more about the technical paper as there were a few areas which I am a little bit unclear about
    • Become more familiar with R as well as trying to learn more about data visualization and interpretation
    • Complete next week’s assignments and attend more team meetings
  • Task Completion

Overall I am a bit mixed about how I feel about the project. There was a bit of a weird scheduling issue I’ve had to deal with, which led to my progress being a bit behind but I have in the most part caught up. The R training videos were definitely most appreciated in helping me review and complete the R exercise as well as the webinar recordings in helping me figure out the technical paper.

Self Assessment Week 3

  • Technical Area:
    • Learned about quality control as well as the necessity of data normalization and a few data normalization techniques
    • Learned about data interpretation and interpreted results obtained from a raw and normalized data set
  • Tools:
    • Learned how to use RStudio to perform data preprocessing and quality control on data set obtain from GEO
    • Learned about the different tools to assess quality of RNA data samples
    • Learned how to use Asana to track and complete tasks
    • Learned more about Pandas for data plotting
  • Soft Skills
    • Organized team meetings with team members through Slack as well as asking for help
  • Three Achievement Highlights
    • Completed the week three deliverables
    • Completed the DESeq2 exercises
    • Performed both RMA and Mas5 analysis on data as well as generating relevant plots on data quality using multiple different packages
  • List of meetings attended:
    • 6/15 Team meeting, 6/16 Asana training, 6/16 Python training, 6/17 Technical training webinar, 6/18 Team meeting, 6/18 Office Hours,
  • Goals for upcoming week
    • Complete week four deliverables
    • Explore more of the quality control packages
    • Read more about the technical paper
    • Update Linkedin page
  • Detailed statement:
    • Figuring out how to use each affy package to plot and analyze the data was rather difficult given the numerous specific requirements and functionality each package had. Add in long load times for Rstudio to compile/execute, dealing with the items required for week 3 was tough. I had to do quite a bit of testing on my own as well as looking up documentation on commands like prcomp and asking for help until I generally figured out what I had to do. Along with reviewing past recordings of the technical training webinars and some help from my teammates, I was finally able to figure out how to perform most of the commands properly.

Self Assessment Week 4

  • Technical Area
    • Learned more about Geo2R functionality
    • Learned about Github functionality
    • Learned more about microarrays and bioinformatics
  • Tools
    • Learned how to use Geo2R to perform differential gene analysis on the dataset
    • Learned how to perform gene annotation through Rstudio as well as using the Limma package for differential gene analysis
    • Learned more about plotting data through Python as well as data extraction through Pandas
  • Soft Skills
    • Reached out to teammates for help on Rstudio as well as helping teammates solve problems on their end
    • Reached out to team leads for help and questions
  • Three Achievements
    • Completed the week 4 deliverables with teammates
    • Worked with teammates to create a presentation on results from Week 3
    • Gained a better understanding of bioinformatics as well as the data analysis and RNA interactions
  • List of Meetings
    • 6/22 Team meeting, 6/23 Python training, 6/24 Office Hours, 6/24 Fireside chat, 6/25 Github training, 6/25 Webinar, 6/25 Team meeting, 6/26 R training, 6/26 Office hours
  • Goals
    • Complete week 5 deliverables and other assignments required
    • Practice around with the Limma package a bit more
    • Begin actively working on LinkedIn profile
  • Detailed Statement:
    • Overall the most difficult item of this week was working with the Limma package to perform the differential gene analysis. Annotating the data was relatively easy once I had figured out the general code in Rstudio but figuring how to use Limma was especially difficult given the lack of direction on how to use the program. While I was sort of able to figure out the program by following the script provided by Geo2R as well as help from my teammates, I am still a bit shaky with using the package properly.

Week 5 Self Assessment

  • Technical Area
    • Learned about Gene Oncology, KEGG, and WikiPathways analysis on genes
    • Learn more about interpretation/analysis on graphs and figures
    • Learned more about Limma analysis
    • Learned more about reading documentation for R/python to use packages properly
  • Tools
    • Learned how to use Gene Ontology Analysis, Kyoto Encyclopedia of Genes and Genomes, and WikiPathways packages in Rstudio to create visuals for interpretation on gene expression
    • Learned how to use Limma package to perform differential gene analysis
    • Learned how to plot data through python using pandas, matplot, and seaborn
    • Learned how to use github to upload/download code
  • Soft Skills
    • Communicated with team mates by helping each other fix coding issues
    • Worked with other groups in coordinating meetings and creating presentation
    • Reached out to leads through office hours to ask questions
  • Three Achievements
    • Completed and submitted second python exercises through github
    • Worked with group 1 for presentation on Week 4 results
    • Completed the deliverables for week 5 for GO, KEGG, and wikiPathways
  • List of Meetings
    • 6/28 Team meeting, 6/30 Fireside chat, 6/30 Office Hours, 7/1 Webinar, 7/1 Office Hours, 7/2 Team meeting, 7/3 Office Hours
  • Goals
    • Complete week 6 deliverables and other assignments
    • Update LinkedIn profile
  • Detailed Statements
    • Exported Limma analysis data as a csv and imported it into a dataframe for sorting to create a gene vector
    • Created two dataframes mapping Entrez ID and Ensembl both to Gene symbol
    • Performed GO and KEGG analysis using their respective commands by looking up documentation and following Week 5 instructions. Created csv files and plots for each analysis

Week 6 Self Assessment

  • Technica Area
    • Learned more about KEGG, Gene Ontology, and WikiPathways analysis
  • Tools
    • Learned how to use the STRING database and EnrichR tools
  • Soft Skills
    • Learned more about Resume building, elevator pitches, and divergent thinking
  • Meetings Attended
    • 7/6 Meeting, 7/8 Team Meeting, 7/9 Python training, 7/10 Office Hours
  • Achievement Highlights
    • Completed Week 6 deliverables
    • Worked with team mates over inconsistencies with analysis
    • Updated my resume
  • Goals
    • Complete final deliverables
  • Task Completed
    • Completed analysis of GSE8761 data set as guided by week 6 directions into down regulated and up regulated gene vectors. However due to some inconsistencies with the Limma analysis, the results generated were a bit off from what was found from my teammates. This was resolved by downloading my teammates Limma data and running my script through it.

Week 7 Self Assessment

  • Technical Area
    • Learned how to analyze PPI networks
  • Tools
    • Continued using Rstudio for data normalization, analysis, and visualization
    • Continued using GEO2R, EnrichR, and STRING database to analyze DEG genes and proteins
    • Learned how to use Cytoscape for PPI network creation
  • Soft Skills
    • Learned more about presentation strategies and networking
  • Three Achievement Highlights
    • Completed final deliverables and presentation
    • Discovered how to use Cytoscape
    • Analyzed GSE 21510 data set using previous code/tools without much issues
  • List of Meetings
    • 7/13 Team meeting, 7/15 Team meeting, 7/21 Team meeting, 7/22 Happy Hour
  • Goals
    • Present/Practice for final presentation
  • Detailed Statement
    • Completing tasks for the final deliverables was relatively easy considering it was simply a culmination of the previous weeks of work. Overall I didn’t face many issues as the code I used worked fine
    • I decided to try and continue going forward with the pipeline for the paper given that we stopped at PPI analysis of the dataset. However I was only able to go up to analyzing the PPI network with Cytoscape since this was a new tool for me to learn and use. Overall I did some research and found a general tutorial for analyzing/constructing PPI networks and finally figured out to perform a limited analysis.

Final Self Assessment (Victor Jian)

Overview of things I learned: Technical Skills:

  • Learned how to use the Pandas library to create dataframe and extract data in Python as well as using the Matplotlib and Seaborn libraries to create graphs and charts using said data
  • Learned how to read a scientific paper properly as well as the strategies in approaching them to get a better understanding of the content
  • Learned more about the roles in genetics in cancer as well as the many general components of a ceRNA network
  • Learned how to setup and perform a DEG analysis on microarray data using Rstudio and bioconductor packages as well as using GEO2R
  • Learned how to perform quality control on dataset through Rstudio to determine outliers and overall quality of data
  • Learned how to work on large datasets in Rstudio to sort, extract, and modify data as well as creating matrices and dataframes
  • Learned about data normalization and how to create/analyze PCA plots, heatmaps, and volcano plots through ggplot2 and other R packages
  • Learned how to perform Pathway analysis on dataset as well as the many online tools and software available
  • Learned how to use the STRING database and Cytoscape to create PPI networks and analyze them
  • Learned how to use github to upload and organize code

Tools:

  • Python, Rstudio, Jupyter Notebook, Stemaway Forum, Asana, Google Suites, Bioconductor, Slack, Github, GEO2R, GEO database, GSE 21510 dataset, EnrichR, Cytoscape, STRING database, and the Paper (Guo, Li et al. “Construction and Analysis of a ceRNA Network Reveals Potential Prognostic Markers in Colorectal Cancer.” Frontiers in genetics vol. 11 418. 8 May. 2020, doi:10.3389/fgene.2020.00418)

Soft Skills:

  • Worked with teammates virtually to perform weekly tasks and coordinate meetings
  • Communicated with mentors and team leads over email and meetings for clarification or help
  • Worked with other teams to coordinate presentations and meetings
  • Learned about the importance of creativity in STEM, elevator pitches, networking, presentation strategies, divergent thinking, and resume building

Tasks Completed

  • Attended weekly meetings for python and Rstudio training as well as completing related exercises.
  • Read and followed the research paper through its research pipeline from DEG analysis to construction of ceRNA network. The paper was rather complex with many new terms and concepts however after attending the weekly meetings, webinars, and working with team mates, I have a decent understanding of the overall process behind it and its concepts.
  • Normalized GSE 8671 and 21510 microarray data through Mas5, RMA, and GCRMA techniques
  • Performed quality control on datasets using arrayQualityMetrics package on normalized and raw datasets
  • Used ggplot2, pheatmap, and EnhancedVolcano packages to create PCA plots, heatmaps and Volcano plots on normalized and raw data
  • Used hgu133plus2.db package to annotate dataset by mapping probeIDs to gene symbols
  • Used quantile() function to filter out genes expressed below the 4th quantile for datasets
  • Used limma package to perform DEG analysis and calculate logFC, p value, and adjusted p values for filtered datasets
    • GEO2R was also used for this
  • Investigated different thresholds for determining significance of differentially expressed genes and determined appropriate cutoffs
  • Created volcano plot on the DEG analyzed dataset using different significance cutoffs and thresholds
  • Extracted phenotypic data from datasets and transferred to expressionSet object of the respective datasets
  • Created vector of top differentially expressed genes and mapped their entrez IDs
  • Performed GO analysis, KEGG analysis, and EnrichGO analysis to determine the pathways, gene ontology, and molecular functions of upregulated and downregulated DEG
    • EnrichR was also used for this as well
  • Used STRING database and Cytoscape to construct PPI network of the top DEG and analyze them
  • Created powerpoint highlighting achievements and work throughout internship at STEM-Away

Achievement Highlights

  • First time using Rstudio, however I now have a decent grasp on using it to create plots, run data through functions and packages, extracting data, and import/exporting data
  • Completed each weekly deliverables and training exercises
  • Learned how to Cytoscape to perform PPI network analysis and used it on the GSE 21510 dataset
  • Learned about the process of determining prognostic biomarkers for cancer as described by the research paper.

Presentation: Part1 of StemAway_Final_Deliverables_Victor_Jian (1).pdf (1.5 MB) Part2 of Copy of StemAway_Final_Deliverables_Victor_Jian.pdf (3.4 MB)

Sorry for the file separation, the original file is too large to be uploaded in one part. Link below contains file in one part: https://drive.google.com/file/d/17mpxEI_LKhLA50lCTSa5qc7gaWT9MNXX/view?usp=sharing