Sona - Bioinformatics Pathway

Self Assessment 2 - 28/07/2020

Overview of things learned:

  • Technical - R Studio for data pre-processing ready for DE analysis
  • Tools - R Studio, Slack, Google Meet
  • Soft Skills - communicating within a team virtually, scheduling meetings using Google Meet, creating and delivering a presentation effectively, networking and cold contacting on LinkedIn

Three achievement highlights:

  • Completed the tasks set out and produced the deliverables on time
  • Led my sub-group as Task Lead, organising and leading the sub-group meeting
  • Produced and delivered a presentation summarising the aims, deliverables, outcomes, challenges, and successes of the week

Presentation link: https://drive.google.com/file/d/1OHI7m3JDC1B_i1BIi-JRSALTtvT32wAL/view?usp=sharing

List of meetings attended including social team events: Attended all meetings (except for happy hour due to scheduling conflicts)

Goals for the upcoming week:

  • Put my networking skills learned from the soft skills seminar to use both within and outside of the STEMaway organisation
  • Complete the deliverables on time and to a high standard
  • Learn more about using R for bioinformatics and DE analysis
  • Learn more about and practise using GitHub

Detailed statement of tasks done: Deliverables:

  • QC using affyPLM - produced RLE and NUSE boxplots and histograms
  • Normalisation using grcma
  • Batch correction
  • Visualisation using PCA plots

Other:

  • Organised and led our sub-group meeting
  • Wrote minutes for the sub-group meeting and communicated these to members afterwards
  • Created and delivered a presentation summarising the week’s progress

Challenges and how those challenges were overcome: Technical challenges:

  • Error with R saying that not enough memory had been allocated for a function to run - overcame this by going through the troubleshooting channel and seeing that somebody else had the same error and that a suggested solution had already been posted
  • Initially I thought the PCA plots produced did not look as expected due to an error in the batch correction step. This actually was due to the code used to colour the PCA plot points, which I initially based on the training materials. Only when I learned more about the functions and tried varying the code myself did I manage to successfully colour the PCA plot points - I did this by adding the sample type column from the batch metadata to the data frame being plotted instead of manually grouping the points into cancer or normal categories.

Workflow challenges:

  • The sub-group did not have the opportunity to meet up until the day the deliverables were due, so some members did not complete all of the deliverables. Therefore, we decided to change our workflow from next week onwards by meeting earlier to effectively utilise office hours and ensure all members of the group are supported and successful at producing the deliverables!
2 Likes

Self Assessment 3 - 04/08/2020

Overview of things learned:

  • Technical - R Studio for DE analysis, using pull requests and branches to work collaboratively using GitHub
  • Tools - R Studio, GEO2R, GitHub, Google Meet, Slack, STEMaway, Zoom Breakrooms
  • Soft Skills - networking, communicating effectively to work within a team virtually, problem solving and troubleshooting within a team and also asking for help when needed!, thinking about diversity & inclusion within education and the workplace

Three achievement highlights:

  • Effectively communicated within the sub-group on Slack with regular check-ins on how the code was going to help troubleshoot and complete the deliverables
  • Implemented networking skills learned last week by adding STEMaway peers on LinkedIn, as well as using the cold contacting template as a way of expanding my network with interesting, inspiring people who I don’t know personally yet!
  • Learned to use GitHub and uploaded deliverables using GitHub

List of meetings attended including social team events:
Attended all meetings (except for happy hour due to scheduling conflicts)

  • 29/07 - GitHub webinar
  • 20/07 - Office Hours
  • 01/08 - Group 3 meeting to discuss deliverables
  • 03/08 - Team 1 meeting
  • 04/08 - Team 1 deliverables presentation

Goals for the upcoming week:

  • Complete the deliverables on time
  • Learn more about functional analysis and the tools to use
  • Focus on the biological meaning behind the data, linking functional analysis output graphs to biological implications in colorectal cancer

Detailed statement of tasks done:
Deliverables:

  • Annotation using hgu133plus2.db
  • Gene filtering using collapseRows and duplicate (thinking about different methods and selecting a method with justification based on how these would affect the results)
  • DE analysis using limma (lmFit, eBayes, and topTable)
  • Visualisation using heatmaps and volcano plots

Other:

  • Used GitHub to submit deliverables by creating a branch and a pull request to merge this to the master
  • Learned to use GEO2R for a quick overview of the analysis and to help guide the direction of the code I was writing in R

Challenges and how those challenges were overcome:

  • Initially struggled with using collapseRows() function. After attempting to troubleshoot myself using the function help documents and online forums, I resolved this by asking for advice in the team troubleshooting channel
  • Heatmap initially did not appear as expected, with all of the boxes appearing dark blue instead of a range of colours. Used the sub-group chat and office hours to help resolve this!
  • Thought about how the different groups within the team all had different top DEGs in our results - the task leads reassured us that this was normal and likely due to slightly different steps/order of steps taken during normalisation or filtering. Will not dwell on this too much but will keep it in mind and maybe cross-reference the common genes between the different results to see if these are important when looking into the functions and biological relevance!
2 Likes

Self Assessment 4 - 11/08/2020

Overview of things learned:

  • Technical - RStudio for functional analysis, STRING for functional analysis, relating genetic data to biological implications, reading scientific papers, troubleshooting and finding errors in code
  • Tools - RStudio, STRING, GSEA, GEPIA, StackOverflow for troubleshooting
  • Soft Skills - working virtually within a team, networking, problem-solving and troubleshooting independently but also knowing when to ask for help!

Three achievement highlights:

  • Completed the deliverables and for the most part was able to independently problem-solve when coming across technical issues
  • Re-read the scientific paper the DE analysis was based on in more detail and focused on how our data related to the biological implications
  • Continued to put networking skills to use within and outside of STEMaway

List of meetings attended including social team events:
Attended all meetings except for Wednesday’s deliverables Q&A due to scheduling conflicts, but I caught up on this by watching the recording!

  • 05/08 - caught up on Deliverables Q&A recording
  • 06/08 - Office Hours
  • 10/08 - Team 1 meeting

Goals for the upcoming week:

  • Time management - as there are a lot of steps in the overall pipeline to complete within the next two weeks, I will create a rough timeline (with contingency plans!) of when I am hoping to complete each step. This will help ensure I can keep on track when working independently.
  • Learn how to create metadata from the overall data file to help ensure the bioinformatics pipeline runs smoothly
  • Focus on the biological meaning of the data to interpret the results and what this means in terms of the disease
  • Complete some background reading into the disease I have chosen to investigate, so I will have a better idea of the biological implications of the results of my DE analysis.

Detailed statement of tasks done:
Deliverables:

  • Gene Ontology Analysis - using enrichGO(), setReadable(), and barplot()
  • KEGG Analysis - using enrichKEGG() and dotplot()
  • Gene-Concept Network - using enrichDGN(), setReadable(), and cnetplot()
  • STRING analysis
  • Transcriptional Factor Analysis - downloading data from MSigDB - GSEA, using cnetplot()
  • Survival Analysis - using GEPIA to produce survival plots, beginning to interpret survival plots
  • Focusing on the biological implications of the results of the functional analysis

Other:

  • Frequently communicated with my team on Slack to help troubleshoot and give each other guidance, as well as provide support and encouragement

Challenges and how those challenges were overcome:

  • Struggled slightly with plotting the GO barplots as I wasn’t sure how to get it to show the top 20 terms. Troubleshooting myself using internet forums and the help guide - realised there were 2 different functions called barplot!
  • When I reached the cnetplot() step, the resulting plot was very messy because it was a web of thousands of genes so the visualisation was not at all useful! I realised this was because I hadn’t filtered the initial DEGs list based on fold change, so I went back and changed this number until the visualisation of the cnetplot was improved!
  • I wasn’t sure how to interpret the survival plots so I began researching this, but will continue to independently research this and utilise office hours to find out more. Understanding this will be important for completing my analysis for the final presentation!
2 Likes

Self Assessment 5 - 18/08/2020

Overview of things learned:

  • Technical - quality control and data visualisation (normalisation using rma, batch correction, PCA plots, identifying outliers), differential gene expression analysis (limma, volcano plots, heatmaps)
  • Tools - RStudio, STRING-DB, GSEA, Survival plots, NCBI, StackOverflow
  • Soft Skills - creating an effective presentation, time management, presenting virtually, public speaking

Three achievement highlights:

  • Working quickly and under pressure to create a presentation on functional analysis of the colorectal data analysis, then presenting this at the team deliverables meeting (last minute cover for the task lead who was unable to make it to the meeting)
  • Time management: created a rough timeline of when I wanted to complete each step of the pipeline for the new data and so far have followed this well
  • Created metadata for my selected dataset - this was a step we had not completed before so it was exciting to learn how to do this and know that I am now able to complete every step of the pipeline independently!

Presentation link:

List of meetings attended including social team events:

  • 12/08 - functional analysis webinar Q&A (catch up on recording)
  • 14/08 - Team 1 deliverables presentation meeting
  • 17/08 - Team 1 meeting
  • 18/08 - Presentation webinar

Goals for the upcoming week:

  • Complete the code for my final presentation early so that I have time to ask any further questions at office hours
  • Create an interesting and effective presentation based on my deliverables, linking it to the biological meanings behind breast cancer
  • Rehearse my presentation so I can deliver it in a polished, understandable, and engaging way

Detailed statement of tasks done: Deliverables:

  • Downloaded the breast cancer data set I intend to analyse
  • Created a metadata csv file from the series matrix file
  • Completed the quality control and data visualisation steps of the pipeline
    • QC using affyPLM - produced RLE and NUSE boxplots and histograms
    • Normalisation using rma
    • Batch correction
    • Visualisation using PCA plots

Other:

  • Created a presentation on colorectal cancer functional analysis and delivered this at the team meeting
  • Created a timetable of small achievable tasks to ensure I stay on-track to complete my final presentation deliverables

Challenges and how those challenges were overcome:

  • Clashing timetables meant our sub-group was not able to have a meeting this week and so our communication was not as strong this week. We still utilised the slack group chat well.
  • Last minute changes of plans meant I had to step in to create and deliver the deliverables presentation at the team meeting (though this turned into one of my achievement highlights as I was proud of how it turned out!)

Self Assessment 6 - 25/08/2020

Overview of things learned:

  • Technical - differential gene expression analysis (limma, volcano plots, heatmaps), functional analysis (GO, KEGG, STRING-DB, Survival plots)
  • Tools - RStudio, STRING-DB, GSEA, Survival plots, NCBI, StackOverflow, Powerpoint
  • Soft Skills - creating an effective presentation, time management, presenting virtually, public speaking

Three achievement highlights:

  • Kept to my intended timetable of small achievable goals to ensure my presentation was completed on time
  • Produced and delivered a presentation that covered the bioinformatics pipeline whilst maintaining a focus on the biological meanings behind the data - and why this data analysis is actually useful!
  • Ensured I understood how to interpret the biological meanings of functional analysis outputs, including the GO analysis, KEGG analysis, STRING-DB, and survival plots

Final presentation:

List of meetings attended including social team events:

  • 21/08 - Final Presentation
  • 24/08 - Team 1 meeting

Detailed statement of tasks done:

  • Differential gene expression analysis

    • Annotation using hgu133plus2.db
    • Gene filtering using collapseRows and !duplicate (thinking about different methods and selecting a method with justification based on how these would affect the results)
    • DE analysis using limma (lmFit, eBayes, and topTable)
    • Visualisation using heatmaps and volcano plots
  • Functional analysis

    • Gene Ontology Analysis - using enrichGO(), setReadable(), and barplot()
    • KEGG Analysis - using enrichKEGG() and dotplot()
    • Gene-Concept Network - using enrichDGN(), setReadable(), and cnetplot()
    • STRING analysis
    • Transcriptional Factor Analysis - downloading data from MSigDB - GSEA, using cnetplot()
    • Survival Analysis - using GEPIA to produce survival plots, beginning to interpret survival plots
    • Focussing on the biological implications of the results of the functional analysis
  • Creating a presentation to showcase my final deliverables, explaining the steps in the bioinformatics pipeline whilst maintaining a focus on the biological importance of the data analysis

Goals for the future:

  • Develop further understanding of different bioinformatics pipeline designs so I can adapt my current knowledge to different datasets
  • Learn to use different tools such as oligo (instead of just affy) to also be able to analyse different datasets
  • Continue to practise using R! Develop further confidence coding and think about how I can apply computational biology to my undergraduate research and beyond
  • Carry forward the soft skills I have learned during STEMaway webinars, such as networking and personal branding
1 Like