Medhini7 - Bioinformatics Pathway

Self Assessment 6/16:

Technical Area:

  • Learned how to code in R for the first time
  • Understood how to download and use various packages in R in order to create different visuals
  • Learned the importance of the different packages in R and the plots that we can produce to visualize data (volcano plots, PCA, heat maps, etc.)
  • Understood the concepts highlighted in the research paper and the figures
  • Downloaded Jupyter Notebooks and used it for the first time

Tools:

  • Learned how to code in R using R studio
  • Learned how to use Jupyter Notebooks to create and run python code

Soft Skills:

  • Gained confidence asking questions and contributing at group meetings
  • Learned how to communicate through various channels (STEM-Away, Slack, etc.)

Three Achievements:

  • I have not been able to attend a lot fo the training sessions, but I was able to understand the material well by watching the videos and practicing on my own.
  • Making my first post on the STEM-Away forum for the meeting notes from the logistical webinar on 6/10
  • I was able to work my way through the R exercises, and it made sense!

List of Meetings Attended: 6/2 First R training, 6/6 First Python training, 6/10 Logistical Webinar, 6/11 Gene Team Meeting, 6/15 Gene Team Meeting, watched the videos from the trainings and webinars that I could not make

Tasks Completed:

  • Watching all of the training videos for R and Python and completed the assigned exercises
  • Read and understood the paper and figures by taking notes

Goals for the Upcoming Week:

  • Create the visuals from the R trainings on other data sets
  • Attend a team happy hour to get to know everyone more
  • Contact my group to get started on next week’s deliverables

Self-Assessment 6/23

Technical Area:

  • Created PCA plots and quality control reports in R
  • Learned how to troubleshoot my R programs by using Google and the package’s documentation
  • Able to conduct preliminary analysis on data and perform normalizations

Tools:

  • Learned how to navigate the GEO Accession Database
  • Learned how to use more packages in R including affy, affyPLM, and affyQCReport
  • Learned how to use Asana

Soft skills:

  • Started the communication between my group and learned how to work in a team
  • Learned how to communicate efficiently during group meetings to explain topics that other people did not understand

3 achievements:

  • I had a lot of trouble figuring out the PCA plots, but after posting on the forums for help and learning more about how the different methods work, I was able to get my plots!
  • Successfully communicated with my group and worked together to finish the week 3 deliverables. Everything made sense to us!
  • I was able to apply some of the R knowledge from this internship in research that I am doing at my college as well.

List of meetings/trainings attended:
6/16 Asana Training, 6/16 Python and Pandas Webinar, 6/17 Technical Training Webinar, 6/18 Gene Team Meeting, 6/19 Gene Team Happy Hour

Tasks Completed:

  • Successfully completed and understood all of the week 3 deliverables
  • Signed up for Asana and have been getting all the emails and notifications

Goals for upcoming week:

  1. Start on week 4 deliverables ahead of time, so I can go to office hours if I am confused.
  2. Help someone that is confused either on the forums or in my group
  3. Learn the concepts behind the methods rather than blindly using them

Detailed Statement of Tasks Done:
I have used the affy library and the ReadAffy method to read in the .CEL files, then using the affyPLM package to normalize data and create histograms for the median RLE and NUSE scores in order to determine the outliers in the data. From there, I determined that there were two outliers (samples 13 and 43), and I created PCA plots and heat maps of the data before and after normalization.

Self Assessment 6/30

Technical Area:

  • Learned how to annotate genes in R
  • Preliminary understanding of Limma (package in R)
  • Using pandas and seaborn in python
  • Marking tasks on Asana

Tools:

  • Annotation packages in R
  • Preliminary understanding of Pandas, seaborn, Matplotlib in Python
  • Github

Soft skills:

  • Communicated with group 4 on week 4 deliverables; we were able to discuss our different results and figure out the cause of our differences
  • Participated in last week’s fireside chat by asking questions to the speaker

3 achievements:

  • We were really confused about how to perform the analysis with the limma package, but after searching on google and talking to one another, were were able to figure it out.
  • I was able to go through most of the python exercises and complete some of the challenges with minimal lines of code.
  • I understood the annotation part of the deliverables really well and was able to explain the process to my group members.

List of meetings/trainings attended:
6/24 Office Hours, 6/24 Fireside Chat, 6/25 Bioinformatics Webinar, 6/25 Gene Team Meeting, 6/26 Office Hours, 6/29 Gene Team Meeting, 6/29 Python Training Session #5

Tasks Completed:

  • Finished week 4 deliverables successfully
  • Compiled phenotype data
  • Presented on week 3 deliverables

Goals for upcoming week:

  1. I cannot make the team meeting on Thursday for the presentations, so I want to be as helpful as possible during our meeting to create the presentation.
  2. Understand the pandas, Matplotlib, and seaborn packages more and finish applying them in the python exercises
  3. Understand the limma package more

Detailed Statement of Tasks Done:
I worked on all of the deliverables for week 4. As a group, we decided to do them all individually and ask questions and compare when we met for our meetings. During these meetings, I was able to explain how I did the annotation part of the deliverables. I was also in charge of the volcano plot that we created and compiling our final data for submission. During the presentation for the week 3 deliverables, I was in charge of discussing the importance of normalizing when it came to PCA plots and the key difference between our PCA plots before and after normalization.

Self Assessment 7/7

Technical Area:

  • Learned how to perform gene set enrichment analyses in R
  • Sharing code through Github repositories

Tools:

  • Gene set enrichment analysis (clusterProfiler, GSEA, enrichGO, etc) methods in R
  • Using the Metascape online tool

Soft skills:

  • Practiced an elevator pitch for career fairs in team meeting
  • Updated resume based on resume tips given in team meeting

Three achievements:

  • I had a really hard time getting the GSEA method to work in R, but after asking on the forums and looking through the clusterProfiler GitHub book, I was able to figure it out. I was also able to help my teammates that were struggling.
  • I learned how to apply different aesthetics to the seaborn and Matplotlib visuals created for the second set of python exercises.
  • I was really nervous about doing an elevator pitch on the spot, but it went pretty well. Now I know how to create and improve one.

List of meetings/trainings attended:
6/30 Fireside Chat, 7/1 GitHub Webinar #2, 7/6 GT Meeting with Mentors, 6/8 Gene Team Meeting

Tasks Completed:

  • Finished week 5 deliverables successfully
  • Finished second set of python exercises
  • Created presentation for week 4 deliverables

Goals for upcoming week:

  1. I am hosting an office hours on Friday, so I want to create a good powerpoint for that and help others as much as possible by being prepared.
  2. Understand more about how the gene set analysis packages work
  3. Explore enrichR and David tools that I did not get a chance to use this week

Detailed Statement of Tasks Done:
Although I could not make the presentation on week 4 deliverables last week, I was at the meeting where we made the presentation. I created a visual to describe the use of the database in converting between probeIDs and gene symbols. I worked on all of the deliverables for week 5. We used four different methods to understand the functions of the genes (enrichGO, enrichKEGG, enricher, and GSEA). From these methods, we understood that a lot of the unregulated genes were common cancer genes involved in pathways such as the PI3K-mTor pathway that is commonly unregulated in cancers. Additionally, we used online tools (metascape and STRING) to understand our data as well and see if similar results could be drawn. The metascape results also showed similar results to what we saw from the different methods in R.

Self Assessment 7/22

Technical Area:

  • Continue to use R

Tools:

  • Continued using all of the packages from previous weeks

Soft skills:

  • Continued to talk during team meetings

Three achievements:

  • Implemented the pipeline from the previous data set to a new dataset
  • I was able to go through all of the previous coding steps with the new dataset.
  • I held office hours last week, and they went pretty well!

List of meetings/trainings attended:
7/13 Gene Team Meeting, 7/15 Gene Team Meeting, 7/21 Gene Team Meeting

Tasks Completed:

  • Successfully implemented all the deliverables to a new dataset

Goals for upcoming week:

  1. Finish presentation on the new dataset
  2. Understand the gene annotation categories on the dot plots and bar plots
  3. Practice the presentation for my real presentation on Friday

Detailed Statement of Tasks Done:
I was able to implement the pipeline that we used on the dataset from the GUO paper on another dataset from a similar paper. By doing so, I was able to cement my understanding of the code that I had written for the previous deliverables and understand how to remove outliers, perform an analysis with Limma, and create volcano and PCA plots.

Final Self-Assessment

What I Have Learned:

  • Understanding scientific writing (figures and methodology)
  • Analysis of microarray data in R
  • Downloading GEO datasets
  • Creating PCA plots, heat maps and volcano plots using R
  • Differential expression analysis using Limma
  • Determining gene ontology using R
  • Running quality control assessments in R
  • Creating plots in python using seaborn and matplotlib
  • Downloading data from websites using python
  • Other packages in R (clusterProfiler, affy, affyPLM, etc.)

Tools:

  • Asana
  • G-Suite
  • Jupyter Notebook
  • R Studio
  • Github
  • STEM-Away forum
  • Slack

Soft Skills:

  • Communicating in a team setting to troubleshoot. set up meetings, and work together to accomplish deliverables
  • Presentation creation and delivery skills
  • Elevator pitches
  • Asking questions during team meetingss

Achievements:

  • I have been able to learn a lot of R and packages in R despite entering with very little knowledge. I was able to apply a lot of this knowledge for personal use as well.
  • Using seaborn and matplotlib to create plots for data sets downloaded into a pandas data frame
  • Learning how to upload code to GitHub.
  • I learned a lot about the pipeline involved in analyzing microarray data including the process for eliminating outliers and creating a gene ontology analysis in order to figure out what types of genes are over- and under- regulated.

Medhini_Rachamallu_Final_Presentation.pdf (1.1 MB)