Sanyamahesh - Bioinformatics (Level 1) Pathway

Module 1 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Determining how to code in R and debug errors more efficiently
  • Learning how to read and understand research papers, even when the topic is advanced
  • Navigating sources of data visualization such as ggplot2

Tools:

  • Rstudio
  • ggplot2
  • YouTube

Soft skills:

  • Problem solving: it was initially difficult to determine how to fix code but with practice, I soon learned how to identify and research errors easily
  • Time management: It was easy to get caught up trying to fix a bug and I learned to become more efficient with problem solving
  • Resource collection: I needed a few extra resources to help me understand the basics of R and this project taught me where to look for these sources and how to internalize the information provided in them

Three achievement highlights

  • Determining the basics of R including syntax and how to identify and fix errors
  • Learning more about the field of bioinformatics
  • Determining how to efficiently read and comprehend research papers

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • I initially had some difficulty navigating the StemAway website and determining where the find the appropriate resources and instructions but I was able to find all the posts I needed upon further research and the modular internship demo meeting on Tuesday
  • I successfully installed and learned how to use Rstudio
  • I looked through the ggplot2 website to further my knowledge in data visualization and its implementation in code
  • I watched the linked videos on bioinformatics and significantly advanced my knowledge in the field
  • It was initially challenging to understand the jargon of the research paper but I learned to efficiently browse and register the important information
2 Likes

Module 2 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Understanding how to most efficiently use the information located in GEO database
  • Debugging errors from storing data in variables
  • Using the proper way to store files on laptop to access them from the appropriate directory

Tools:

  • GEO database
  • Bioconductor
  • Rstudio

Soft skills:

  • Resource collection: by reading the guides posted in the stem away website and exploring the database on my own, I was able to figure out how to find the necessary files and store them in the correct directory
  • Communication: I was able to ask some mentors for help on the platform and work through the errors in my code with the meeting recordings provided
  • Problem Solving: through the use of help from mentors and online code forums such as stack overflow, I could figure out the cause of error messages in my code more efficiently than before

Three achievement highlights:

  • Navigating and download essential files from GEO database
  • Learning how to extract metadata from txt files in Rstudio using code
  • Determining how to use and store the results of the readAffy function.

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • I initially struggled with locating which of the metadata was needed from the txt file but upon watching the meeting video with the mentors I was able to figure it out
  • I learned how to navigate the GEO website and explore all that it had to offer
  • I learned how to debug my code as I encountered errors with creating new variables that contained the data
  • I was able to encounter the final result of the data stored in the files in an organized Rstudio layout

Module 3 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Downloading missing packages through the Bioconductor website
  • Becoming more familiar with plotting tools such as ggplot2 and pheatmap
  • Familiarizing myself more with string formatting in R for labels of the plots and group column

Tools:

  • Rstudio
  • Ggplot2
  • pheatmap

Soft skills:

  • Problem solving: it was initially difficult to figure out what each specific error generated in the console meant but with some trial and error as well as help from mentors and online forums such as stackoverflow, I was able to gain a more solid understanding of the demands of R functions.
  • Teamwork: I was able to attend a few team meetings and watch recordings for the meetings in order to coordinate the work I was doing with teammates.
  • Communication: with every issue I ran into that I was unable to work through, I became more confident with reaching out to Anya or another mentor for help on the material. After visiting office hours and sending messages through the forum, I was able to establish solid communication in order to fix my bugs.

Three achievement highlights

  • Determining how to properly install packages into the R environment and understand what the error messages were trying to convey
  • Debugging errors based on the production of data visualization, especially with ggplot2
  • Becoming more familiar with how to submit work through GitHub

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • While running my code for the generation of the quality report, I initially had some errors regarding the proper loading of packages onto my RStudio environment. However, after attending Anya’s office hours, I was able to fix it by deleting my stringi file and reinstalling it.
  • I successfully generated the boxplots of the data before and after batch correction
  • I also had some trouble with the generation of my pheatmaps and using the ggplot tool as I continued to get errors but I reached out to Anya to get some guidance on why those errors were appearing and I figured it out.
  • I learned how to submit my code and images through GitHub with the instructions on the stemaway website.
  • I also had some errors with memory allocation for the vectors using in the boxplot function but I was able to fix this by changing the max memory through help found on stackoverflow.

Module 4 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Downloading volcano plot and database packages
  • Determining how to remove outliers in data and when to use data frames rather than AffyBatch objects
  • Determining how to access probe IDs from an object

Tools:

  • Rstudio
  • Enhanced volcano
  • pheatmap

Soft skills:

  • Problem solving: I initially encountered a few errors when trying to annotate my data as I was using the simplified batch rather than the entire gse AffyBatch data set but I was able to identify my mistakes with help from mentors and debug the rest of my code based on those errors.
  • Critical thinking: I was able to identify why some errors were occurring and determine how I needed to change my data inputs to fix the bugs.
  • Communication: I was able to reach out to Anya for any technical questions that I had and this greatly helped me identify errors and clarify overarching questions I had about R in general.

Three achievement highlights

  • Learned the various inputs that different data functions in R require
  • Identified how to remove outliers in my data using the tyne comma
  • Generated volcano plots and heatmaps for data

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • Initially when I tried removing my outliers with code, I encountered an error but this was fixed when I added a comma as suggested to me by Anya.
  • I also had some trouble with annotating my data as I was using the meta data rather than gse. After I began using the batch corrected data for the rest of my module however, I did not encounter many other errors.
  • I encountered an error when generating my heatmap but I was able to fix this easily as I had previously seen this error in module 3 so I know what the mistake was.
  • Additionally, I was able to generate a file with the significant genes modified by values and the batch in order of p values.
  • I was also able to calculate the top 50 DEG and generate the necessary data plots for this module based on the data I had modified.

Module 5 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Learned how to properly use and specify enrichGO function to specify a category to analyze
  • Effectively converted gene symbols into entrezids
  • Created accurate visualizations of data using cnetplots and bar plots
  • Determined how to change row names of data table based on symbols in the tables

Tools:

  • Rstudio
  • clusterProfiler
  • Cnet plots
  • GitHub

Soft skills:

  • TIme management: I was able to ensure that I didn’t spend too much time on one specific part of the gene analysis by consulting the guide and online forums when I was stuck on a problem
  • Conflict resolution: I was able to accurately fix bugs that arose that made my code non functional. I did this by reading what the error messages when and fully understanding what each of my variables contained so I new which one to use and manipulate in each function
  • Clear communication: I was able to communicate with teammates for the presentation we were making on the module and reach out to mentors such as Anya and Hale about questions in the work. I was also able to make a collaborative presentation with another group member

Three achievement highlights:

  • Generated accurate plots and graphs based on data tables created through DEG analysis
  • Fixed bugs that arose when using the select function as the probeid key was missing by changing syntax
  • Conducted transcription factor analysis using the enricher function and fixed errors regarding the input of the upregulated vector

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • I generated the upregulated vector by accessing the logFC column from my DEG topTable. I also converted the gene symbols into entrezids using the select function.
  • I was initially getting error messages when using the select function but after attending office hours, I changed it to the right syntax to make the function work.
  • I generated barplots based on the enrichment analysis
  • I conducted KEGG analysis and generated a dotplot based on this information.
  • I was initially encountering errors with regards to creating my global DEG vector but I spoke with mentors and was able to figure out that I needed to use a different upregulated vector as the input to make the function work.
  • I also encountered the error that said two columns are needed but I used the tip in the guide to accurately navigate the solution
  • I was able to generate all the necessary cnetplots and save the names of the vectors in the last step for use in the next module.

Module 6 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Learned how to use different web tools to analyze DEG data
  • Determined how to interpret the data that each web tool provided
  • Learned what specification we need to put in to each tool for them to provide the desired output

Tools:

  • Rstudio
  • EnrichR
  • StringDB
  • DAVID

Soft skills:

  • Problem Solving: I was able to figure out any difficulties associated with navigating some of the websites as they did have some technical jargon about bioinformatics that I was unfamiliar with
  • Creativity: I was able to use the plots and alter the gene map to determine what the data meant and use this information for the final analysis
  • Resource collection: I was able to learn and remember which web tool provides what information and made sure I could remember this for the future if I ever needed to use it

Three achievement highlights

  • Successfully analyzed my DEG table saved in the previous module with the use of web tools that gave me different kinds of information
  • Learned how to navigate through websites used in the professional field for data analysis
  • Generated and understood plots made by each tool

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • I was able to input the table of gene symbols into each website in order to see what analysis they offered
  • At first, I was confused about if we needed to upload symbols or probeids but with the examples in the website, I was able to figure it out
  • I looked at the different categories of data made by each website as well as their different visualizations
  • Some websites were hard to navigate to, especially those that had graphs on a separate page but I just explored them to get more familiar
  • I interacted with the web plot to see how the data was connected

Module 7 Self Assessment

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills

Technical area:

  • Learned how to apply previously learned topics in bioinformatics to another data set and solve issues that arose for this set
  • Normalized data and generated plots comparing raw and bc data
  • Generated DEG vector of genes in the data set

Tools:

  • Rstudio
  • GEO database
  • Heatmaps, boxplot, and other plotting tools

Soft skills:

  • Critical Thinking: I had to solve some of the errors by looking at what the message was saying and trying to remove the problem in my workspace and this required a lot of complex thinking
  • Communication: communicated with my team and mentor via slack about team updates for last presentation and watched meeting or presentation recordings provided on the channel
  • Time management: since this project put all of them together, it took longer than the rest and I had to effectively manage my time by referring to outside sources such as stackoverflow to solve difficult bugs

Three achievement highlights

  • Extracted data from GEO database and saved it in a variable
  • Normalized the data using the rma function and generated plots comparing the before and after data to show how the function worked on the set
  • Conducted DEG analysis by filtering genes and using other functions such as limma analysis

Detailed statement of tasks completed. State each task, hurdles faced if any and how you solved the hurdle.

  • I got an error when using the select function that said the probeids were missing but I was able to solve this issue by specifying keytype as I had previously encountered this and discussed the solution with Anya
  • I was also initially encountering an error when generating the linear model because the rows and column numbers were different but it worked once I removed empty columns
  • I generated a heatmap of the top 50 DEG genes successfully
  • I conducted GSEA analysis of the genes using the misgdbr function
  • I saved the data set from the database and changed each tissue name to become more standardized to normal or cancer