Overview of Things Learned:
- Technical
- Plotting with ggplot, Unzipping data files, Using the affy package
- Tools
- Importing and exporting in R Studio
- Using Stack Overflow to answer questions
- Soft skills
- Time management (getting on top of things before office hours), Communication with my team
Three Achievements:
- Properly grouping by cancer and batch on my plots
- Creating the model matrix and doing batch correction
- Figuring out to format the PCA plots cleanly
Meetings Attended:
- All meetings: Team 2 Meeting, Biology Webinar, Office Hours, Happy Hour
Goals for this week:
- Collaborate with my team more while working on the deliverables
- Successfully create enhanced volcano plot and heatmap
Tasks Done:
- Met with my team to prepare for the presentation
- Went through QC, normalization, and visualization with few issues and was able to identify the outliers. (Using the help function frequently)
- Originally had a lot of difficulty with the batch correction. Once I had been given an explanation of the arguments and use of the model matrix, I was able to get it worked out much easier.
Week: 8/4/2020
Overview of Things Learned:
Technical area: Using R for differential gene analysis. Specifically using the limma package and creating the enhanced volcano plot were new things for me this week.
Tools: This was my first time using Github to turn in deliverables, so I learned a little bit with that.
Soft skills: Communication with team when I have questions, Being forward with my questions at office hours
Achievement Highlights:
- Using the select() function to remove duplicate probeIDs and gene symbols. Figuring out what the arguments should be in this function.
- Manipulating the matrix/data frame frequently to make sure that it contained the necessary data. There were multiple times where I had to insert or delete columns / rows or change the row and column names.
- Successfully creating and analyzing my volcano plot. My heatmap and volcano plot seemed to match up in terms of which genes are up and down regulated
Meetings Attended: (All of Them)
Team meeting, Presentation meeting, Github Webinar, Office Hours, Happy Hour
Goals for this week:
- Do a more thorough analysis of the upregulated genes and their functions
- Organize the team well as the task lead for the week
- Create well defined and aesthetic plots
- Have a good understanding of how to use the String database
Detailed Tasks Done:
- Annotation
- Was confused on how to drop duplicate ProbeIDs, so I had to attend office hours
- Originally, I was able to get the select() function to run, but I was using the wrong inputs and got a table of NAs. Did some research online and collaborated with teammates to figure this out.
- Gene filtering
- Was rather successful in completing this step.
- Analysis with limma
- It was pretty easy for me to follow the directions on the deliverables document
- One issue was that I was initially using coef=2, but after doing research on what that really meant, I realized that it should only be 1 because there’s only contrast.
- Heatmap
- Originally could not figure out how to get gene symbols on the right side, but after looking online, I realized that I should not be using cor() function
- Volcano Plot
- Presentation
- Worked efficiently with Tanish to create the presentation
#R #Github #Communication #LimmaAnalysis
Week: 8/11/2020
Overview of Things Learned:
Technical area: enrichGO and enrichKEGG analysis, barplots and cnetplots, analyzing networks to find gene hubs
Tools: StringDB, RStudio
Soft skills: Communication with my team, Maintaining a positive attitude even when my R program hasn’t functioned properly
Achievement Highlights:
- Identified the hub genes on StringDB based on the genes that had the most connections.
- Reasoned that I needed two different gene vectors: one that is specifically for the upregulated genes and one for the unfiltered full set. Successfully created both.
- Created aesthetically pleasing circularnetplots. Determined the top categories and related genes.
Meetings Attended:
Team meeting, Python Webinar, Github Webinar, Presentation meeting, Deliverables Webinar, Happy Hour
Goals for the Week:
- Work on the final project with little debugging in the code
- Currently, I haven’t had success using the enricher() function because it’s giving me a result with dimension 0x9. I hope to figure out why this error is occurring and fix it.
- Create a thorough hypothesis with my group for the presentation on Friday
- Analyze the survival curves of input genes
Detailed Tasks Done:
- DEGs Vector
- Originally, I made a matrix with two columns, one with logFC and the other with entrezid
- After playing around and looking online, I realized that I needed to make a named vector
- Gene Ontology with enrichGO
- Did this step pretty easily and got plots
- KEGG Analysis
- This was also pretty easy
- Gene-Concept Network
- Had trouble originally inserting the fold change because I didn’t have a named vector, but once this was fixed, I was able to make good netplots
- StringDB
- GSEA
- Still having difficulties with using enricher()
Week: 8/18/2020
Overview of Things Learned:
Technical area: analyzing survival plot, GSEA, heatplots, transcriptional factor analysis
Tools: GEPIA
Soft skills: Communication and teamwork, Presentation skills
Achievement Highlights:
- Successfully used the GSEA() function after diagnosing what I had been doing wrong and plotted the GSEA plot and heatplot.
- Utilized GEPIA software to look at the survival plots of FOXQ1 and MMP10. Discovered that low MMP10 had a lower survival rate, which went against my prediction.
- Assembled the metadata for my dataset, which included more than two groups. Used this to make my model matrix.
Meetings Attended:
Team meeting, Office Hours (8/11), Functional Analysis Webinar, Presentation Meeting
Goals for the Week:
- Determine the similarities in gene expression between COPD and healthy smokers
- Be able to field all questions/be knowledgeable during my final presentation
- Finish functional analysis for my final project. Create aesthetically pleasing plots
- Determine good ways to extend this research with COPD and bioinformatics into the future
Detailed Tasks Done:
- GSEA
- This is something that gave me a lot of trouble, but with a suggestion from another participant, I realized that I could get results by changing the p value cutoff
- Made heatplot pretty easily
- Transcriptional Factor Analysis
- Similar problem with GSEA, but for this one I had to change the cutoff for logFC in order to get any enriched terms
- My plot never really ended up looking like the example in the deliverables document
- Used GEPIA to look at a variety of genes and played around with the different functions on the website
- Worked on a hypothesis and conclusion with Tanish for the presentation
- Selected COPD dataset for final project
- I looked into a bunch of the datasets before actually choosing one
- Originally, I wanted to do batch correction, but the datasets I liked didn’t align with other ones
- Created metadata
- Worked on the entire pipeline up until the functional analysis for this dataset
Week: 8/25/2020
Overview of Things Learned:
Technical area: Creating metadata, Pipeline with multiple contrasts (using makeContrasts()), Debugging DEGs vector
Tools: RStudio, Github
Soft skills: Communication with my team, Time Management, Using a growth mindset
Achievement Highlights:
- Able to make many connections between the biological aspects of COPD and the online data set. The data analysis sections matched my hypothesis that gene expression would be more similar between COPD and “healthy” smokers than COPD and nonsmokers
- Put together and presented a final presentation. Originally it was double the amount of time than it was supposed to be, so I made important decisions on what information to cut out.
- Made plans of how I can use the skills learned in this internship moving forward in my career.
Meetings Attended:
Team meeting, Final Presentation, Team Meeting (8/24)
Note: I was traveling the week of 8/16, so it made it challenging to attend as many meetings as I would’ve liked to.
Goals for the Week:
- Read through the COPD paper to see if they made similar conclusions as I did
- Use a similar pipeline to investigate gene expression of C. albicans
Detailed Tasks Done:
- Conducted a lot of research on the biological aspects of COPD to see what types of genes I would expect to see most expressed
- Predicted it would be ones related to the immune system response
- Created three DEGs vectors for the three contrasts
- Used enrichGO and groupGO
- Performed KEGG analysis and created a category network plot
- Performed transcriptional factor analysis
- Found the hub genes on the String database
- At first, I had trouble saving the list of gene names in a way that would make it easy to put them into String
- Had to create a new vector to do this
- Prepared and practiced for my presentation
- Gave a 13 minute presentation on COPD and my data analysis
Final Bioinformatics Presentation (1).pdf (2.5 MB)