Bioinformatics - Level 1 - Aditi Verma

Overview:

  • Technical Area: R programming (visualizing and coding on RStudio), debugging, reading research papers, and Bioinformatics (ins and outs of bioinformatics and steps to research).
  • Tools: RStudio
  • Soft Skills: Learning how to navigate through the STEM-AWAY website and asking for further guidance through the message feature.

Achievement Highlights

  • Successfully downloaded Rstudio and got it to run.
  • Grasped the basic understanding of how to display visuals and how to overall code on R.
  • Got a deeper understanding of Bioinformatics and its implication of statistics, programming, and biology.

Details of Tasks and Hurdles

  • Initially, it was challenging to navigate through the module and to understand how Pathway Hub worked, but after contacting Anya, all my questions were resolved.
  • Installing R and RStudio was also confusing, but after asking for assistance, I was able to install it successfully.
  • The visualizations and different arguments on R got complex and I kept getting errors, but I managed to solve some of the issues.
1 Like

Overview:

  • Technical area: GEO database (downloading and gathering sample data), installing packages, importing and organizing metadata, and batch correction.
  • Tools: RStudio, Excel, GitHub
  • Soft skills: Implementing fundamental knowledge into actual work such as collecting data and using functions in R.

Achievement Highlights:

  • Successfully took data samples from GEO and successfully imported them into excel.
  • Applied Module 1 knowledge of how to use the install function on R to install packages.
  • Experienced how data samples are stored in a long text file and how to read understand its organization.

Details of Tasks and Hurdles:

  • Use a dataset and collect a sample, the batch, and a feature into an excel sheet. Then import the excel file into R and use the merge function to batch correct.
  • Downloading the GEO text file to gather data gave me some difficulty, but I soon discovered that the link does not work on Chrome and the problem was fixed.
  • Trying to read and understand the lengthy data file was difficult at first.
1 Like

Overview:

  • Technical Area: Background correcting and normalizing data. Performing quality controls and data visualizations. Analyzing the visuals and identifying outliers.
  • Tools: R program, GitHub, Excel.
  • Soft Skills: Using the fundamental understanding of R to manipulate datasets. Practicing different visualization functions such as ggplot2, PCA, and traditional R plots. Learning how to work neatly and in an effective manner such as labeling and organizing files and codes.

Achievement Highlights:

  • Discovering mistakes made in Module 2 before it was too late. such as importing the CEL files using ReadAffy, not read.excel().
  • Getting a better understanding of how to use plot function arguments such as different arguments for ggplot2, pheatmap, and boxplot.
  • Successfully plotting the PCA graph and understanding more about what PC1 and PC2 are.
  • Adding labels to plots to reduce confusion and to make the plots more readable and analyzable.
  • Turning in the module 3 deliverables to GitHub for the first time.

Detailed Tasks:

  • Explore different plotting functions such as QCReport and the basic QC plot function.
  • Performing background correction and normalization using different methods such as rma() and gcma().
  • Plot a boxplot, PCA graph, and a heatmap using the raw and normalized/background corrected data.
  • Turn in all the visuals and the code in Github and post the outliers in the forum.

Hurdles and Memorable Experiences:

  • I had lots of trouble understanding what PCA was and how PC1 and PC2 worked. I also kept getting error messages when I tried to plot things which decreased my enthusiasm. After I asked questions and cleared the fog one by one, it felt as though a miracle happened and I actually understood how to code on R.
  • Learning how to pay attention to details was key in this module. I realized that no matter how many times I read the step by step guides, I had never completely grasped how to use the functions because I was looking at the larger picture. For example, I was trying to learn how to use every argument and ended up being confused and unable to successfully plot anything. In this module, I found out that just focusing on the simple arguments such as how to ass titles to graphs and to label the axis was more important than learning more than I could handle.
  • I uncovered many mistakes that I made in module 2 because I was unfamiliar with R and the whole platform of data analysis. Now, being that I have gone through 3 modules, I feel more confident about R programming especially on how to use plotting functions and how to analyze graphs.
  • I would say module 3 is definitely the point when I had a major epiphany.

Goal/Tips for Module 4

  • I looked through some self-assessment examples and am trying to improve mine, so I will try to make the next one even better than before.
  • If stuck, ask frequent questions and also carefully read the module guide.
  • Successfully and confidently finish the module.
1 Like

Overview:

  • Technical Area: Annotating and filter genes, limma analysis (DEG), volcano and heat map plots.
  • Tools: R program and GitHub.
  • Soft Skill: Continuing to build on the fundamental of R programming. Visualizing plots using different packages and getting used to working with different arguments. Working diligently without giving up.

Achievement Highlights:

  • Successfully plotting both the volcano and the heat map (with color coding), and practicing plots in R.
  • Learning more skills such as changing the row names and manipulating data frames.
  • Getting a better understanding of what differentially significant genes are and how they appear in the topTable function.
  • Receiving a deeper understanding of GitHub repositories.

Detailed Tasks:

  • Install several required packages and remove outliers from the raw data
  • Annotate the data set and remove PROBIED, duplicate SYMBOLs, and NA values
  • Use the limma package for further analysis and derive a topTable of the top DEG
  • Create a volcano plot of top DEG and a heat map of the top 50 DEG
  • Submit all module 4 deliverables to GitHub

Hurdles and Memorable Experiences:

  • Module 4 was quite a bit more challenging than before because I was missing a few steps in my annotation process that resulted in funky looking plots. I scheduled a one-on-one meeting and thankfully got it solved with Anya’s help. I also learned more than I expected.
  • Color coding the heat map was also a challenge, but I utilized online resources and also reached out for help which made the process much easier. I am now (hopefully) able to plot without encountering many issues.
  • I realized that I submitted my module 3 deliverables incorrectly (structure-wise) so I had to fix it and this allowed me to learn more about GitHub, at least how to organize my files in the repository.

Goal/Tips for Module 5:

  • I am going to try my best to get module 5 done a bit quicker than this one (module 4 took unexpectedly longer than I expected).
  • Keep improving on self-assessments.
  • Continue to ask questions when needed.

Overview:

  • Technical Area: Enrichment, over-representation, and network analysis
  • Tools: R program and GitHub.
  • Soft Skill: Utilizing new packages. Working in an organized manner. Researching to enhance understanding. Keeping up with my busy schedule as much as possible.

Achievement Highlights:

  • Generating my DEG vector successfully
  • Efficiently and correctly solving my issue when conduct gene ontology analysis.
  • Practicing data frame and vector manipulation to change the row names as well as removing and merging data.
  • Not encountering many issues with my code as usual

Detailed Tasks:

  • Install several required packages
  • Generate a GEG vector and conduct gene ontology analysis looking at three ontologies
  • Conduct and plot the enrichment analysis for the KEGG database
  • Use Hallmark gene set to conduct GSEA analysis and plot it
  • Conduct transcription analysis using regulatory target gene sets and plot it
  • Submit all module 5 deliverables to GitHub

Hurdles and Memorable Experiences:

  • Module 5 was one of the smoothest running modules yet. I felt my improvement in coding, especially solving problems on my own.
  • I did experience some issues using the hallmark gene set because I ended up downloading the SYMBOL set rather than the ENTREZ ID one.
  • I was stuck at the transcription analysis, more specifically, using one of the functions because there happened to be a typo on the guide pdf. I am glad that I found the typo and also that I didn’t ponder on the issue too much before reaching out for help (I had begun to install many packages to figure out where the function belonged and was unsuccessful).
  • I encountered some time restraints. My school work was getting out of control and I could barely have time to work on the module, but I am glad that I managed to stay diligent. I basically made sure to accommodate at least 30 minutes every day to work on the module ( I, quite honestly, did skip some days).

Goal/Tips for Module 6:

  • Have my code run as smoothly as it did in Module 5
  • Keep improving on self-assessments.
  • Possibly pick up my pace this time on the module?
  • Continue to ask questions when needed.

Overview:

  • Technical Area: Performing and analyzing enrichment analysis such as networks and survival analysis
  • Tools: GitHub, Web-based tools (GEPIA, Metascape, STRING DB)
  • Soft Skill: Navigating through new environments, problem solving, organizing files to keep track of progress

Achievement Highlights:

  • Successfully navigated different analysis websites
  • The list of gene symbols worked in the data analysis showing that I did module 5 correctly
  • Gained a better understanding of string graphs and their interactions with different protein nodes

Detailed Tasks:

  • Perform several analysis in Group A, such as enrichment analysis
  • Perform more analysis in Group B, such as network and survival analysis
  • Analyze and try to understand the different diagrams
  • Submit all module 6 deliverables to GitHub

Hurdles and Memorable Experiences:

  • Because some of the analysis provided several charts and graphs, I was confused as to which one to focus on. Later, I realized that they all had different functions and it was important to understand all of them to a certain degree.
  • Network plots (string plots) were the most challenging to understand. After asking one of the mentors for help, it became clear that the paths showed different proteins and their connection with other nodes.

Goal/Tips for Module 7:

  • Try to get as much of module 7 done
  • I also want to understand the different analysis from module 6 a little bit better
  • Continue to ask questions when needed.