Vmax - Bioinformatics Pathway

Concise overview of things learned

Technical Area

  • Learning more R skills specific to Bioinformatics
  • Getting more comfortable coding in Python
  • Refreshing my knowledge of biology and diving deeper into molecular biology

Tools

  • Google Calendar has been a life-saver! It’s easier to attend more meetings because all the times and links are in one place
  • Getting more comfortable with the STEM-AWAY platform, though the user experience is still a bit unintuitive to me
  • Slack and Google Drive have been working perfectly for me
  • Excited to learn how to use ASANA, that’s a new platform I have no experience with

Soft Skills

  • Developing my ability to be patient and compassionate
  • Recognizing that things are super hectic and rather than complaining about the things that aren’t going well, offering solutions to the problems I point out
  • Being mindful of other people’s time
  • Being grateful for the amount of behind-the-scenes effort that us participants may not see, but which is clearly happening

Three achievement highlights

  1. Attended almost all mandatory meetings in real time (I missed one R training session on Tues, July 9)
  2. Completed assignments on time
  3. Being proactive about asking questions and clarifying things I don’t understand

List of meetings/ training attended including social team events

Week 1

  • Fri 6.29: Internship Kick-off
  • Mon 7.1: Bioinformatics Webinar
  • Mon 7.1: Team Meeting
  • Tues 7.2: R Training
  • Wed 7.3: Bioinformatics Webinar
  • Fri 7.5: R Training (only attended first hour)
  • Fri 7.5: Happy Hour

Week 2

  • Mon 7.8: Team Meeting
  • Tues 7.9: Python Training
  • Wed 7.10: Logistical Training (only attended 1.5 hours)
  • Wed 7.10: Technical Training
  • Thurs 7.11: Office Hours
  • Thurs 7.11: Gene Team Meeting
  • Fri 7.12: Welcome Session
  • Fri 7.12: R Training
  • Fri 7.12: Happy Hour (only attended first 15min)

Goals for the upcoming week

  1. Complete and self-correct all assignments
  2. Meet with my smaller team and go over what our assigned tasks are

Detailed statement of tasks done

  1. Assigned R Tasks from Book (3.2.4, 3.3.1, 3.7.1)
  • Had questions about the difference b/w geom_bar and geom_col . Went over the entire assignment during our team meeting on Thurs 7.11 and this hurdle was clarified
  • Couldn’t figure out how to install certain packages and did my own research and realized I needed to update my version of R
  1. R Exercises from June 5 and June 9 Training
  • I’m still confused about the metadata and will raise this question during our next team meeting
  1. R Exercises from June 12 Training
  • I’m still confused about the relationship between differentially expressed genes, upregulated and downregulated genes, so I had trouble answering exercise 2. I will ask Yves during our next training session
  • Unsure how to change the data for the volcano plot. I will clarify this during our next training session
  1. Read the paper Construction and Analysis of a ceRNA Network Reveals Potential Prognostic Markers in Colorectal Cancer
  • Unclear about figures. We went over the first four during our team meeting on Mon 7.15, which was incredibly helpful
  1. Python Exercises
  • Problem 2: Write a function where when given an integer n , perform the following conditional actions
    – During office hours on Mon 7.15, Goral helped me understand that I couldn’t but n in square brackets
  • Problem 6: Write conditional statements to check the following statements
    – During office hours, Goral helped me with the notation ( 4 in x) for checking whether an integer is in a list
  • Problem 10: String Repetition and String Length
    – During office hours, Goral helped me understand the notion print (len(“Hey!” * 10)) for printing the length of the string “Hey!” , which has been repeated 10 times.

Concise overview of things learned

Technical Area

  • Converting data into different data types
  • Normalization techniques
  • PCA plots
  • Heatmaps
  • Volcano Plots
  • Understanding how to work with microarray data
  • Brushing up on statistics and understanding the math behind the plots

Tools

  • ASANA: It’s up and running and very useful

Soft Skills

  • Team work: Collaborating on a project is great, dividing up tasks and then coming together to discuss what went well and what didn’t work
  • Presenting to a larger audience: Forces me to better understand the work I’ve done because I have to explain it to a new audience

Three achievement highlights

  1. Have a pretty good grasp on the data and concepts we are learning
  2. Efficient, effective and successful team work
  3. Continuing to ask questions and go to office hours, not being afraid to admit I don’t know things

List of meetings/ training attended including social team events

Week 3

  • Mon Jun. 15: Gene Team Meeting
  • Mon Jun. 15: Python Office Hours
  • Tues Jun. 16: ASANA Training
  • Tues Jun. 16: Python + Pandas Training
  • Thurs Jun. 18 Gene Team Meeting
  • Fri Jun. 19: Team 5 Meeting to work on deliverables

Goals for the upcoming week

  1. Meet with my Team to go over deliverables
  2. Submit Week 4 Deliverables
  3. Submit Python Problem Set #bioinformatics-summer2020

Tasks Completed

Week 3 Deliverables (Team 5)

  • Used rma normalization technique on microarray data
  • Learned how to convert data into matrix using exprs() and data frame using as.data.frame
  • Created histograms of median RLE scores and median NUSE scores to assess quality of data
  • Created Principal Component Analysis Plot of unnormalized and normalized data

Concise Overview of Things Learned

Technical Area

  • Becoming more comfortable with the various BiocManager Packages in R
  • Model matrix
  • Volcano Plots with thresholds

Tools

  • GitHub

Softskills

  • Team work: Checking in with each other, recognizing when it’s time to take a break and come back to the problem with fresh eyes

Three achievement Highlights

  1. Collaborating with my team to solve problems and clear up confusion
  2. Able to understand my coding errors, and better understand what the code is doing
  3. Read more academic papers on similar topics to better contextualize our work

List of meetings / trainings attended

Week 4

  • Mon Jun. 22: Gene Team Meeting
  • Tues Jun. 23: Team 5 Meeting
  • Wed Jun 24: Office Hours
  • Wed Jun 24: Fireside Chat
  • Wed Jun 24: Team 5 Group Meeting to go over presentation
  • Thurs Jun. 25: GitHub Webinar
  • Thurs Jun. 25: Webinar
  • Thurs Jun. 25: Gene Team Meeting
  • Fri Jun. 26: R Training

Tasks Completed

Week 4 Deliverables (Team 5)

  • Created a table containing the top 100 differentially expressed genes. The table included gene symbol, p-value, and log fold change

  • Defined thresholds to identify statistically significant differentially expressed genes

  • Created a model matrix

  • Created a volcano plot with results from the AffyData

Concise overview of things learned

Technical Area

  • affy, AnnotationDbi, hgu133plus2.db, simpleaffy, arrayQualityMetrics, affyQCReport, genefilter, limma and GEOquery Packages in RStudio

Tools

  • GitHub: Continuing to get comfortable using it

Soft Skills

  • Team work
  • Asking for help when I’m confused

Three achievement highlights

  1. Went to office hours and got in touch with a mentor
  2. Submitted Python problem set despite technical difficulties
  3. Collaborating with my team on week 5 deliverables

List of meetings/ training attended including social team events

Week 3

  • Mon Jun. 29: Gene Team Meeting
  • Mon Jun. 29: Team 5 Meeting
  • Mon Jun. 29: Python Training
  • Tues Jun. 30: Fireside Chat
  • Wed July 1: GitHub Webinar
  • Thurs July 2: Gene Team Meeting

Goals for the upcoming week

  1. Meet with my Team to go over deliverables
  2. Submit Week 5 Deliverables

Tasks Completed

Week 5 Deliverables (Team 5)

  • Create vector containing logFC named by Gene name and set in decreasing order
  • Gene Ontology Analysis and create barplot to visualize those results
  • Create dotplot, barplot and plotGOgraph to show top 20 terms from SetReadable
  • KEGG Analysis
  • WikiPathways analysis
  • David
  • network analysis: STRING

Final Assessment

Concise Overview of Things Learned

Technical Area

  • Mine microarray data to extract biological meaning to understand underlying mechanisms relating to colorectal cancer
  • Read Affymetrix Array data obtained from GEO using Affy package in RStudio
  • Check Quality Control using the simpleaffy and affyPLM packages
  • Normalize the raw dataset using RMA normalization technique
  • Gene expression analysis and identification of differentially expressed genes
  • Apply linear model to the normalized dataset using the limma package
  • Explore the relationship between threshold values, log fold change, and p-values
  • Analysis of genes and their biological functions using DAVID, wikipathways, and Gene Ontology (GO) Analysis
  • Visualization of microarray data such as bar plots, dot plots, hierarchical clustering maps, heatmaps, Principal Component Analysis, and volcano plots

Tools

  • Asana
  • DAVID
  • GEO database
  • GitHub
  • Slack
  • STEM-Away
  • RStudio and BiocManager Packages

Soft skills

  • Teamwork: I worked with Group 5. We had an extremely supportive work environment. We tried the deliverables on our own and then met up multiple times during the week to share different ways of doing things and helping each other out with concepts and code.
  • Communication: I communicated well with my group members, reaching out with questions when I had them and helping them out when they needed my support. I was a very active participant during the technical training sessions and always asked questions to clarify any misunderstandings and confusing concepts. I proposed various organizational strategies to the Gene Team Leads, such as using GoogleCalendar, in order to help our entire team remember deadlines and find links to all of the various meetings.
  • Flexibility and adaptability: I was initially very uncomfortable going into this internship due to my lack of experience with Bioinformatics. Over the course of the internship I attended office hours and did some research on my own to better understand the biological mechanisms we were studying.
  • Curiosity: Overall, this internship sparked my interest in Bioinformatics and I am excited to delve deeper into this field.

Achievement Highlights

I have a solid foundation in Bioinformatics. I am comfortable performing quality control, normalization, DEG and functional analysis on raw files from the GEO database. Over the course of this internship I learned how to mine microarray data. This was done to extract biological meaning of the data in order to better understand the underlying mechanisms of colorectal cancer.


Future Goals
I’d like to combine my paleontological research with genome analysis and study human evolution during the Pleistocene epoch. I have plans to pursue a Master’s Degree in Archaeo- and Paleogenetics. I am excited to conduct genomic research on ancient DNA to better understand our ancestors.