Yoga_K - Bioinformatics Pathway

A concise overview of things learned:
Technical Area: I learned how to code in R. Specifically, I now know how to create volcano plots and PCA plots. I also learned about how the ggplot library worked in R. Regarding the research paper, I learned about new types of visuals such as PPI networks and KEGG pathway maps.
Tools: I learned how to use Jupyter notebook for Python. I have only used Python in IDEs and it was my first time using a Python notebook. Also, I finished the R exercises in R studio which resulted in me learning more about how R studio works.
Soft Skills: I have frequently attended the meetings, which gave me experience in talking in both large and small groups of people. I’ve also become more comfortable with asking questions during technical webinars and group meetings. I have improved my communication with Slack, email, and the STEM Away website.

Three Achievements:

  1. I finished all the R exercises and the Python exercises.
  2. After reading the paper multiple times and going to the meetings, I understand the methods used to obtain the data included in the research paper.
  3. I initiated the messages between my group of four teammates.

List of meetings/trainings/social events attended: 6/1 Technical Training Webinar, 6/1 Team 4 First Official Meeting, 6/3 Technical Training Webinar, 6/5 R Training Webinar, 6/5 Team 4 Happy Hour, 6/8 Team 4 Meeting, 6/9 Python Training Session #2, 6/10 Logistical Webinar, 6/10 Techincal Training Webinar, 6/10 Welcome Session on Leadership and Program Management, 6/11 Office Hours with BI Leads, 6/11 Gene Team Meeting, 6/12 R Training Workshop, 6/12 Gene Team Happy Hour, 6/15 Gene Team Meeting

Goals For the Upcoming Week:

  • Reread results and conclusion portion of the research paper and understand more about the significance of the results.
  • Become more proficient in R and learn about how to further customize graphs to visualize data.
  • Understand the database more and learn about the different ways to categorize the data.

Tasks Completed:

  • I attended and watched all the R and Python trainings. I also finished the exercises provided. Going to team meetings and talking about the solutions to these exercises helped me further understand the logic behind the solutions.
  • I read and took notes on the research paper. At first, I was confused about how the data was obtained and I didn’t understand some of the figures provided. However, asking questions during office hours and team meetings definitely helped me have a clearer view of the data used in the research paper.

A concise overview of things learned
Technical Area: I was able to use R to check the quality of our data. I also learned how to normalize the data. This week I discovered the importance of data cleaning and how to make sure that our results are accurate based on quality data.
Tools: I worked with R studio and learned more about libraries in R such as ggplot and functions such as the QCReport.
Soft Skills: I had much smaller team meetings this week so I was able to learn how to work with people of different experiences and coordinate our tasks so that we could all effectively finish our goal of normalizing our data.

Three Achievements:

  1. I reread and understood the results and conclusion portion of the research paper.
  2. I created two histograms. One was for the median RLE scores of each of the genes in our data. Another was for the median NUSE scores of each of the genes.
  3. After some confusion that my team and I had regarding the PCA plots, I posted a question on the forum and with help from the forum, my team and I were able to graph a PCA plot.

List of meetings/trainings/social events attended: 6/16 Python and Pandas Workshop, 6/17 Technical Training Webinar, 6/18 Gene Team Meeting, 6/19 Gene Team Happy Hour, 6/22 Gene Team Meeting

Goals For the Upcoming Week:

  • Remove outliers from data set to obtain more accurate results
  • Present normalized data to Gene Team
  • Create a heat map of the data

Tasks Completed:

  • I created two histogram graphs for the median RLE scores and NUSE scores using R
  • I helped prepare the presentation for next week regarding my team’s normalized data
  • I attended all the Gene team meetings
1 Like

A concise overview of things learned
Technical Area: This week I was able to filter out the genes under a specific centile. I also used the metadata to get more accurate results. I learned about how to annotate our data set.
Tools: I worked with R studio, Asana, and GEO.
Soft Skills: I met with my small group, which consisted of three people, and was able to communicate efficiently with Slack and Zoom. I also went to meetings with the entire Gene team throughout the week.

Three Achievements:

  1. I reached out to a mentor from office hours and was able to understand how to create a volcano plot for our data.
  2. I filtered out specific genes that were under the fourth centile.
  3. Since one of our deliverables required us to compare our results with another team, I reached out to a team from the Gene team and was able to understand the differences caused by the normalization techniques.

List of meetings/trainings/social events attended: 6/24 Office Hours, 6/25 Bioinformatics Importance Meeting, 6/25 Gene Team Meeting, 6/26 R Training, 6/26 Office Hours, 6/26 Gene Team Happy Hour, 6/29 Gene Team Meeting, 6/29 Small Group Meeting

Goals For the Upcoming Week:

  • Present the volcano plot to the Gene Team
  • Present the phenotypic data analysis
  • Create a bar plot of the data

Tasks Completed:

  • I filtered out the genes from below the fourth centile
  • I created the model matrix from the data
  • I attended the office hours

A concise overview of things learned:
Technical Area: I learned how to use the pandas library in Python for data analysis. I also learned how to import a data set in Python. In R, I learned how to create a gene vector.
Tools: I developed my skill in the Jupyter notebook for Python. Also, I worked on the deliverables using R studio.
Soft Skills: I attended the team meetings and learned how to effectively communicate in small and large groups. I also went to the happy hour (team bonding) meeting.

Three Achievements:

  1. I finished the Python exercises.
  2. I presented our phenotype metadata to my larger team.
  3. I went to office hours to recieve help from the mentors regarding R.

List of meetings/trainings/social events attended: 6/1 Office Hours, 6/2 Gene Team Meeting, 6/2 Gene Team Happy Hour, 6/8 Gene Team Meeting

Goals For the Upcoming Week:

  • Use the David library in R for data analysis in R
  • Create dot plots and bar plots for the data.

Tasks Completed:

  • I finished the Python exercises with the Jupyter notebook.
  • I presented the phenotype data and relevance to the gene team.

A concise overview of things learned:
Technical Area: I used R to create bar plots, KEGG analysis dot plots, and EnrichGO analysis dot plots. I also understood more about the research paper as I looked at the similarities of what our program accomplished with what the results of the research paper were.
Tools: I developed my skill in R studio. I also was introduced to Wiki pathways, DAVID 6.8, and Wiki Pathways.
Soft Skills: I participated in both large and small team meetings. I also presented to a large group about what my team’s findings were last week.

Three Achievements:

  1. I created KEGG analysis dot plots.
  2. I presented our findings from last week to the Gene Team.
  3. I compared the similarites between our program and the Guo research paper.

List of meetings/trainings/social events attended: 7/8 Gene Team Meeting, 7/9 Small Group Meeting, 7/10 Presentation Group Meeting, 7/13 Gene Team Meeting,

Goals For the Upcoming Week:

  • Analyze the pipeline of the research paper
  • Create a volcano plot for the deferentially expressed genes
  • Propose projects for future sessions

Tasks Completed:

  • I created the dot plots for the KEGG analysis pathway
  • I created bar plots for the cellular components.
  • I presented the KEGG analysis pathway dot plots and the similarities with the research paper.

A concise overview of things learned:
Technical Area: I learned how to make a bar plot for the significant gene ontology terms and also fully understood the Guo research paper.
Tools: I did a through analysis of the Guo research paper and used RStudio to graph volcano plots, bar plots, and a heat map.
Soft Skills: I attended Gene Team meetings and created a final presentation using Google slides.

Three Achievements:

  1. I summarized the Guo research paper’s pipeline in my final presentation.
  2. I created a volcano plot, heat map, and a bar plot with RStudio to perform quality control and understand the function of the genes.
  3. I proposed a potential future project for future bioinformatics session with the analytical tool PROMO.

List of meetings/trainings/social events attended: 7/15 Gene Team Meeting, 7/17 R Office Hours, 7/17 Happy Hour, 7/21 Gene Team Meeting, 7/22 Happy Hour

Goals For the Upcoming Week:

  • Present my final deliverables to a mentor and the team lead.
  • Complete my final self-assessment.
  • Keep networking on LinkedIn.

Tasks Completed:

  • I finished summarising the pipeline of the Guo research paper.
  • I created a volcano plot, heat map, and a bar plot with RStudio to perform quality control and understand the function of the genes

Final Self Assessment

A concise overview of things learned:

Technical Area:

  • Data visualization with R
  • Created PCA, volcano plots, heat maps, dot plots, and bar plots
  • Learned how to read KEGG pathway analysis
  • Found significant gene ontology terms from a genotypic dataset
  • Differential gene analysis
  • Read a research paper
  • Utilized metadata to interpret results
  • Quality control with affyQCReport
  • Found median RLE and NUSE scores
  • Data analysis with Python

Tools:

  • RStudio
  • Jupyter
  • GEO database
  • DAVID analytical tool
  • Bioconductor R packages
  • Asana
  • Slack
  • GitHub
  • G-suite

Soft Skills:

  • Completing deliverables in a group of three other people
  • Presenting findings individually and with my team in front of around 20 people
  • Networking on LinkedIn and on my team
  • Initiating communication with my team
  • Learning how to ask for help from mentors during troubleshooting
  • Went to bonding events with my team

Achievement Highlights:

  • Completed all the deliverables
  • Worked efficiently with teammates
  • Networking on LinkedIn
  • Learned the importance of quality control
  • Learned how to read heat map, enrich KEGG analysis, create PCA, find significant gene ontology terms, and create bar plots with R
  • Completed all the Python exercises
  • Held office hours for participants in the next session

Final Statement / Future Goals:

This internship has helped me explore the field of bioinformatics. I learned about how much information can be provided with data analysis. I want to keep pursuing this field because all the new information found from analyzing datasets can help countless people. It is also a perfect intersection between engineering and biology. I had an amazing time with the people in this internship and look forward to using the skills I have learned from this internship.

Final Deliverable and Presentation: bioinformatics_deliverable.pdf (713.6 KB)