Became familiar with coding, debugging, and data visualization.
Read scientific papers in an efficient yet comprehensive manner.
Tools:
R
Rstudio
Stem-Away platform
Youtube
Soft Skills:
Time management- Learned how to manage multiple projects at the same time.
Troubleshooting- Got an opportunity to develop my problem-solving skills by working through the error-filled R script.
Communication- Interacted with people from various backgrounds which helped develop my speaking skills.
Achievement highlights:
Managed to install and code in RStudio and its various packages.
Successfully read and summarized complex scientific papers.
Became well-versed with concepts in bioinformatics.
Tasks/Hurdles:
I had trouble finding the appropriate version of R. With the help of tutorials on YouTube, I was able to install the correct version.
I followed the resources provided and became comfortable with coding.
While importing one of the data sets, the code in the pdf wasn’t working for me. By re-reading and understanding each component, I was able to write an alternate code in order to complete the task.
I successfully created and customized box and volcano plots using ggplot2 and EnhancedVolcano.
I also went through the presentations on scientific reading and bioinformatics which gave me a deeper understanding of the project.
Explored and downloaded dataset from GEO database.
Imported and saved metadata and .CEL files to RStudio.
Tools:
RStudio
Youtube
Stem-Away
GEO
MSExcel
Soft Skills:
Communication – I was able to clear all my doubts regarding the internship by contacting the project leads.
Troubleshooting
Achievement Highlights:
Navigated and downloaded required data from GEO database.
Installed and explored additional packages.
Read and understood the Long et al. paper.
Tasks/Hurdles:
I explored the GEO database and successfully downloaded all the required data.
At first, I wasn’t able to understand all the information that was provided in the metadata text file. But after spending sufficient time on it, I managed to follow the information.
I had trouble using the ReadAffy() function since I kept on getting an error. However, after watching Anya’s video, I was able to remove it and successfully import my data.
In addition, I used GEO2R to visualize the datasets.
I also watched Ali’s video which talked about GEO and gave an overview of the paper. It really helped me understand the results and methodology of the study.
Performed limma analysis and generated a topTable for the top 50 DEG.
Visualised DEG using volcano plot and heatmap.
Tools:
RStudio
pheatmap
EnhancedVolcano
hgu133plus2.db
limma
Stack Overflow
GEO2R
Soft skills:
Time management
Troubleshooting
Critical thinking
Prepared a presentation on Module 4
Achievement highlights:
Successfully identified and made tables for significant differentially expressed genes.
Graphed volcano plot showing up-regulated and down-regulated genes.
Compared my results with that of the paper using GEO2R analysis tool.
Tasks/Hurdles:
Initially, I wasn’t getting the desired results for gene annotation. But after seeking help from Anya and various online resources, I overcame the difficulty.
I came across several errors while performing the tasks given in this module such as flipped plots, missing genes, and incorrect code. However, I successfully worked through all of them by spending some time on my code.
In contrast to the previous week, I was able to color code and format my heatmap with ease in order to make it visually pleasing.
Performed previously learned steps (Quality control, Normalisation, Statistical Analysis, and Functional Analysis) on a new dataset.
Validated my findings with data mining using several scientific platforms.
Tools:
RStudio
GitHub
PubChem
Google Scholar
GEO
Soft Skills:
Problem Solving- Identifying and selecting comparable samples from the dataset took a lot of trial and error.
Presentation skills- By giving a presentation on my capstone project to mentors and fellow team members, I was able to work on my speaking skills.
Time management- Since this module consisted of applying skills from all the previous modules, it took quite a while to complete.
Achievement highlights:
Selected and imported a new dataset from GEO.
Successfully carried out all the steps and identified differentially expressed genes.
Conducted a literature review to support my outcomes.
Tasks/Hurdles:
I selected dataset GSE22598 and used it to compare gene expression in cancer vs normal tissues of colorectal cancer.
In order to remove outliers and reduce data anomalies, I carried out quality control and normalization. I visualized the same using PCA plots of raw and normalized data.
Performed gene annotation, filtering, and limma analysis so that I could generate a top table containing my most up-regulated and down-regulated genes.
With the help of heatmap, volcano plot, and publications, I cross-checked my top DEGs and identified ENC1 gene as a potential prognostic biomarker for colorectal cancer.
Furthermore, I carried out overrepresentation analysis and GSEA so that I could identify and compare the various enriched pathways.
During the entire process, I also got the opportunity to explore the pathogenesis and other associated genes of colorectal cancer.