Technical - R Studio for data pre-processing ready for DE analysis
Tools - R Studio, Slack, Google Meet
Soft Skills - communicating within a team virtually, scheduling meetings using Google Meet, creating and delivering a presentation effectively, networking and cold contacting on LinkedIn
Three achievement highlights:
Completed the tasks set out and produced the deliverables on time
Led my sub-group as Task Lead, organising and leading the sub-group meeting
Produced and delivered a presentation summarising the aims, deliverables, outcomes, challenges, and successes of the week
List of meetings attended including social team events:
Attended all meetings (except for happy hour due to scheduling conflicts)
Goals for the upcoming week:
Put my networking skills learned from the soft skills seminar to use both within and outside of the STEMaway organisation
Complete the deliverables on time and to a high standard
Learn more about using R for bioinformatics and DE analysis
Learn more about and practise using GitHub
Detailed statement of tasks done:
Deliverables:
QC using affyPLM - produced RLE and NUSE boxplots and histograms
Normalisation using grcma
Batch correction
Visualisation using PCA plots
Other:
Organised and led our sub-group meeting
Wrote minutes for the sub-group meeting and communicated these to members afterwards
Created and delivered a presentation summarising the week’s progress
Challenges and how those challenges were overcome:
Technical challenges:
Error with R saying that not enough memory had been allocated for a function to run - overcame this by going through the troubleshooting channel and seeing that somebody else had the same error and that a suggested solution had already been posted
Initially I thought the PCA plots produced did not look as expected due to an error in the batch correction step. This actually was due to the code used to colour the PCA plot points, which I initially based on the training materials. Only when I learned more about the functions and tried varying the code myself did I manage to successfully colour the PCA plot points - I did this by adding the sample type column from the batch metadata to the data frame being plotted instead of manually grouping the points into cancer or normal categories.
Workflow challenges:
The sub-group did not have the opportunity to meet up until the day the deliverables were due, so some members did not complete all of the deliverables. Therefore, we decided to change our workflow from next week onwards by meeting earlier to effectively utilise office hours and ensure all members of the group are supported and successful at producing the deliverables!
Technical - R Studio for DE analysis, using pull requests and branches to work collaboratively using GitHub
Tools - R Studio, GEO2R, GitHub, Google Meet, Slack, STEMaway, Zoom Breakrooms
Soft Skills - networking, communicating effectively to work within a team virtually, problem solving and troubleshooting within a team and also asking for help when needed!, thinking about diversity & inclusion within education and the workplace
Three achievement highlights:
Effectively communicated within the sub-group on Slack with regular check-ins on how the code was going to help troubleshoot and complete the deliverables
Implemented networking skills learned last week by adding STEMaway peers on LinkedIn, as well as using the cold contacting template as a way of expanding my network with interesting, inspiring people who I don’t know personally yet!
Learned to use GitHub and uploaded deliverables using GitHub
List of meetings attended including social team events:
Attended all meetings (except for happy hour due to scheduling conflicts)
29/07 - GitHub webinar
20/07 - Office Hours
01/08 - Group 3 meeting to discuss deliverables
03/08 - Team 1 meeting
04/08 - Team 1 deliverables presentation
Goals for the upcoming week:
Complete the deliverables on time
Learn more about functional analysis and the tools to use
Focus on the biological meaning behind the data, linking functional analysis output graphs to biological implications in colorectal cancer
Detailed statement of tasks done:
Deliverables:
Annotation using hgu133plus2.db
Gene filtering using collapseRows and duplicate (thinking about different methods and selecting a method with justification based on how these would affect the results)
DE analysis using limma (lmFit, eBayes, and topTable)
Visualisation using heatmaps and volcano plots
Other:
Used GitHub to submit deliverables by creating a branch and a pull request to merge this to the master
Learned to use GEO2R for a quick overview of the analysis and to help guide the direction of the code I was writing in R
Challenges and how those challenges were overcome:
Initially struggled with using collapseRows() function. After attempting to troubleshoot myself using the function help documents and online forums, I resolved this by asking for advice in the team troubleshooting channel
Heatmap initially did not appear as expected, with all of the boxes appearing dark blue instead of a range of colours. Used the sub-group chat and office hours to help resolve this!
Thought about how the different groups within the team all had different top DEGs in our results - the task leads reassured us that this was normal and likely due to slightly different steps/order of steps taken during normalisation or filtering. Will not dwell on this too much but will keep it in mind and maybe cross-reference the common genes between the different results to see if these are important when looking into the functions and biological relevance!
Technical - RStudio for functional analysis, STRING for functional analysis, relating genetic data to biological implications, reading scientific papers, troubleshooting and finding errors in code
Tools - RStudio, STRING, GSEA, GEPIA, StackOverflow for troubleshooting
Soft Skills - working virtually within a team, networking, problem-solving and troubleshooting independently but also knowing when to ask for help!
Three achievement highlights:
Completed the deliverables and for the most part was able to independently problem-solve when coming across technical issues
Re-read the scientific paper the DE analysis was based on in more detail and focused on how our data related to the biological implications
Continued to put networking skills to use within and outside of STEMaway
List of meetings attended including social team events:
Attended all meetings except for Wednesday’s deliverables Q&A due to scheduling conflicts, but I caught up on this by watching the recording!
05/08 - caught up on Deliverables Q&A recording
06/08 - Office Hours
10/08 - Team 1 meeting
Goals for the upcoming week:
Time management - as there are a lot of steps in the overall pipeline to complete within the next two weeks, I will create a rough timeline (with contingency plans!) of when I am hoping to complete each step. This will help ensure I can keep on track when working independently.
Learn how to create metadata from the overall data file to help ensure the bioinformatics pipeline runs smoothly
Focus on the biological meaning of the data to interpret the results and what this means in terms of the disease
Complete some background reading into the disease I have chosen to investigate, so I will have a better idea of the biological implications of the results of my DE analysis.
Detailed statement of tasks done:
Deliverables:
Gene Ontology Analysis - using enrichGO(), setReadable(), and barplot()
KEGG Analysis - using enrichKEGG() and dotplot()
Gene-Concept Network - using enrichDGN(), setReadable(), and cnetplot()
STRING analysis
Transcriptional Factor Analysis - downloading data from MSigDB - GSEA, using cnetplot()
Survival Analysis - using GEPIA to produce survival plots, beginning to interpret survival plots
Focusing on the biological implications of the results of the functional analysis
Other:
Frequently communicated with my team on Slack to help troubleshoot and give each other guidance, as well as provide support and encouragement
Challenges and how those challenges were overcome:
Struggled slightly with plotting the GO barplots as I wasn’t sure how to get it to show the top 20 terms. Troubleshooting myself using internet forums and the help guide - realised there were 2 different functions called barplot!
When I reached the cnetplot() step, the resulting plot was very messy because it was a web of thousands of genes so the visualisation was not at all useful! I realised this was because I hadn’t filtered the initial DEGs list based on fold change, so I went back and changed this number until the visualisation of the cnetplot was improved!
I wasn’t sure how to interpret the survival plots so I began researching this, but will continue to independently research this and utilise office hours to find out more. Understanding this will be important for completing my analysis for the final presentation!
Soft Skills - creating an effective presentation, time management, presenting virtually, public speaking
Three achievement highlights:
Working quickly and under pressure to create a presentation on functional analysis of the colorectal data analysis, then presenting this at the team deliverables meeting (last minute cover for the task lead who was unable to make it to the meeting)
Time management: created a rough timeline of when I wanted to complete each step of the pipeline for the new data and so far have followed this well
Created metadata for my selected dataset - this was a step we had not completed before so it was exciting to learn how to do this and know that I am now able to complete every step of the pipeline independently!
Presentation link:
List of meetings attended including social team events:
12/08 - functional analysis webinar Q&A (catch up on recording)
14/08 - Team 1 deliverables presentation meeting
17/08 - Team 1 meeting
18/08 - Presentation webinar
Goals for the upcoming week:
Complete the code for my final presentation early so that I have time to ask any further questions at office hours
Create an interesting and effective presentation based on my deliverables, linking it to the biological meanings behind breast cancer
Rehearse my presentation so I can deliver it in a polished, understandable, and engaging way
Detailed statement of tasks done:
Deliverables:
Downloaded the breast cancer data set I intend to analyse
Created a metadata csv file from the series matrix file
Completed the quality control and data visualisation steps of the pipeline
QC using affyPLM - produced RLE and NUSE boxplots and histograms
Normalisation using rma
Batch correction
Visualisation using PCA plots
Other:
Created a presentation on colorectal cancer functional analysis and delivered this at the team meeting
Created a timetable of small achievable tasks to ensure I stay on-track to complete my final presentation deliverables
Challenges and how those challenges were overcome:
Clashing timetables meant our sub-group was not able to have a meeting this week and so our communication was not as strong this week. We still utilised the slack group chat well.
Last minute changes of plans meant I had to step in to create and deliver the deliverables presentation at the team meeting (though this turned into one of my achievement highlights as I was proud of how it turned out!)
Soft Skills - creating an effective presentation, time management, presenting virtually, public speaking
Three achievement highlights:
Kept to my intended timetable of small achievable goals to ensure my presentation was completed on time
Produced and delivered a presentation that covered the bioinformatics pipeline whilst maintaining a focus on the biological meanings behind the data - and why this data analysis is actually useful!
Ensured I understood how to interpret the biological meanings of functional analysis outputs, including the GO analysis, KEGG analysis, STRING-DB, and survival plots
Final presentation:
List of meetings attended including social team events:
21/08 - Final Presentation
24/08 - Team 1 meeting
Detailed statement of tasks done:
Differential gene expression analysis
Annotation using hgu133plus2.db
Gene filtering using collapseRows and !duplicate (thinking about different methods and selecting a method with justification based on how these would affect the results)
DE analysis using limma (lmFit, eBayes, and topTable)
Visualisation using heatmaps and volcano plots
Functional analysis
Gene Ontology Analysis - using enrichGO(), setReadable(), and barplot()
KEGG Analysis - using enrichKEGG() and dotplot()
Gene-Concept Network - using enrichDGN(), setReadable(), and cnetplot()
STRING analysis
Transcriptional Factor Analysis - downloading data from MSigDB - GSEA, using cnetplot()
Survival Analysis - using GEPIA to produce survival plots, beginning to interpret survival plots
Focussing on the biological implications of the results of the functional analysis
Creating a presentation to showcase my final deliverables, explaining the steps in the bioinformatics pipeline whilst maintaining a focus on the biological importance of the data analysis
Goals for the future:
Develop further understanding of different bioinformatics pipeline designs so I can adapt my current knowledge to different datasets
Learn to use different tools such as oligo (instead of just affy) to also be able to analyse different datasets
Continue to practise using R! Develop further confidence coding and think about how I can apply computational biology to my undergraduate research and beyond
Carry forward the soft skills I have learned during STEMaway webinars, such as networking and personal branding