Shafaqat.rahman - Bioinformatics Pathway

Overview of Things Learned

Technical Area

  • Stronger understanding of relevant biological terms (i.e. miRNA, mRNA, lncRNA, ppi networks) as they relate to the discussed paper on CRC prognostic markers
  • Stronger understanding of the R language for programming
  • Refreshing my dormant knowledge in Python programming
  • Learned about volcano plots, normalization techniques, principal component analysis

Tools

  • Stem-Away Platform
  • Asana
  • Slack
  • R Studio
  • Jupyter Notebook/Anaconda
  • KEGG Database, GEO

Soft Skills

  • Communicating comfortably with team members and team leads
  • Asking questions at technical webinars and gene team meetings
  • Working on being a better organizer

Meetings Attended

  • Welcome Session
  • Most of the Gene Team Meetings ( I missed the first one I think)
  • Technical Webinars

Goals for the Upcoming Week

  • Dive into the Week 3 project with my group
  • Do more self-study on what’s been covered up until now

Updated 06/25/2020

Technical Area: I continue to learn more about R and Python. I have become highly familiar with the normalization techniques rma() and mas5() as they relate to correcting and normalizing AffyMetrix Array data. I’ve learned that RLE and NUSE are helpful tools for assessing the distribution of array intensities and can be used to understand data quality. Finally, I’ve developed a stronger conceptual understanding of what principal components are and what exactly we gain from studying variance and covariance of variables.

Tools: I have learned about the various libraries that one needs to analyze AffyMetrix Microarray data sets and I’m becoming much more comfortable with R studio and the documentation of these libraries.

Other tools I routinely use are…

  • Stem-Away Platform
  • Asana
  • Slack
  • R Studio
  • Jupyter Notebook/Anaconda
  • KEGG Database, GEO

Soft Skills: Communication with team 5 has been great and I enjoy working with them. We’re very good at accomplishing the weekly deliverable tasks in a timely manner. I also helped present at Team 2 and Team 5’s joint presentation on 06/25/2020. At webinars, I make an effort to ask questions and contribute to the discussion.

Three Achievements:

  1. I made the final PCA plot relating cancer vs control groups, and carried out the final edits to week 3 deliverables.
  1. Did some extra self-study on MA (ratio intensity) plots and how they can also be used to assess data quality. These plots were not heavily discussed in the webinar.
  2. Stronger understanding of how the biological interactions/expressions of RNA sequences can be exploited in a chip/array system to detect for macro-level trends in a data set.

List of events attended: 6/17 Technical Training Webinar, 6/18 Gene Team Meeting, 6/22 Gene Team Meeting, 06/25 Gene Team Meeting

Goals For the Upcoming Week:

  • Better understand the limma package and its documentation
  • Better understand what hierarchical clustering maps can teach a person
  • Finish week 4 deliverables

Updated 06/30/2020

Technical Area: I finally understand what it means for a gene to be differentially expressed, and how it can be related to volcano plots in respect to log(FC) and log(P-value) coordinates. Since we’ve only done one exercise with limma, I can say I have a beginner level understanding of the limma package. The logic and code required to filtering out genes under (or over) a certain level of significance makes sense to me, and I know what fold change, adjusted p-values, and B-statistics imply. Looking back, I had the most trouble understanding a trivial part of the code: the “model matrix”. Now I understand why they’re helpful in doing statistics and studying interactions of variables.

Tools: Limma package, GEOquery library, genefilter library, pheatmap library

Other tools I routinely use are…

  • Stem-Away Platform
  • Asana
  • Slack
  • R Studio
  • GitHub

Soft Skills: Communicating with Team 5 is still great. Communicating with Gene Team is great. Always adding friends in Gene Team to my LinkedIn network.

Three Achievements:

  1. Figured out how to incorporate the model matrix into Team 5’s code and handled the volcano plots for Team 5.
  2. Now know three ways to make volcano plots ( volcanoplot(), EnhancedVolcano(), and a more tedious method that requires using the plot() function and isolating the logFC values and adjusted p-values from the eBayes structure)
  3. Solid understanding of gene differential expression and relevant statistics.

List of recent events attended: 06/29/2020 gene team meeting

Goals For the Upcoming Week:

  • Accomplish Week 5 Deliverables
    *Present Week 4 findings on Thursday

Updated 07/09/2020

Technical Area: Currently learning about deliverables 5 and what is expected. Haven’t been too involved this week due to other responsibilities.

Tools:

Other tools I routinely use are…

  • Stem-Away Platform
  • Asana
  • Slack
  • R Studio
  • KEGG Database, GEO

Soft Skills: Communicated my absence with Team 5 regarding the recent deliverables. Presented the volcano plot data during presentations on 07020202.

List of recent events attended: 07/08/2020 Gene Team Meeting

Goals For the Upcoming Week:

  • Catch up on what I’ve missed

Updated 07/22/2020

Technical Area: I completed the final deliverables through R studio and uploaded the assignment to GitHub. I analyzed the pipeline of the paper, normalized the dataset, removed outliers, removed null values, made a hierarchical clustering plot, a volcano plot, and used the enrichGO function to cluster the DEGs into three groups: biological processes (BP), cellular components (CC), and molecular function (MF).

Tools: GSE21510 dataset, Guo Paper

Other tools I routinely use are…

  • Stem-Away Platform
  • GitHub
  • Slack
  • R Studio
  • KEGG Database, GEO

Achievements:

  • Completed June 1 Bioinformatics session as an observer
  • Developed a solid introductory understanding of Bioinformatics through this internship
  • Made many new friendships along the way

Goals For the Upcoming Week: None

Final Self Assessment:
Things I learned:

Technical Skills:

  • How to read raw Affymetrix Array data obtained from GEO
  • How to perform Quality Control (simpleaffy and affyPLM) and normalization on the raw dataset
  • How to analyze gene expression and determine differentially expressed genes: more particularly the application of a linear model to a normalized dataset, and setting threshold values as they relate to fold change and p-value
  • How to analyze genes based on their biological function and location through DAVID, STRING, wikipathways, and the various GO functions one can use in R (with the appropriate libraries).
  • Many ways to visualize data such as but are not limited to: bar plots, dot plots, volcano plots, plotting principal components, hierarchical clustering maps

Tools:

  • R Studio and the many R packages we worked with (i.e Limma), GEO database, Slack, STEMaway platform, Asana, GitHub, DAVID, STRING

Soft skills:

  • Even though I did not do the final presentation, I had the opportunity to help present Group 5’s findings at 3 presentations throughout the session. I do my part in asking questions at the technical webinars and GeneTeam meetings that I’ve attended. I’ve also been introduced to various project management platforms like Slack, gitHub and Asana, and have developed familiarity with these platforms. I will definitely use them in the future.

Achievements
Despite being an Observer, I’ve developed a strong introductory level understanding of Bioinformatics through this internship and I accomplished more than I initially intended to. I’ve developed a familiarity for the R language, and I’m confident in my ability to extract any raw zip file from the GEO database and run quality control, normalization, DEG analysis, and functional analysis on that dataset as long as its in R. For me, learning bioinformatics while we’re in the midst of a pandemic is a great accomplishment, and I thank StemAway for providing this platform to us all.

Future Goals:
Hopefully, I can expand on these tools that I’ve learned and apply them in graduate school. It would be awesome if I could shift me thesis project to incorporate a bit of bioinformatics, so I’ll look into ways that I can keep learning. Hopefully I can come back to StemAway and participate again in the near future.

Final Deliverables_ShafaqatRahman.pdf (554.8 KB)