Module 1 (REPOST)
- Installed R, RStudio, and other packages.
- Became familiar with coding, debugging, and data visualization.
- Read scientific papers in an efficient yet comprehensive manner.
- Stem-Away platform
- Time management- Learned how to manage multiple projects at the same time.
- Troubleshooting- Got an opportunity to develop my problem-solving skills by working through the error-filled R script.
- Communication- Interacted with people from various backgrounds which helped develop my speaking skills.
- Managed to install and code in RStudio and its various packages.
- Successfully read and summarized complex scientific papers.
- Became well-versed with concepts in bioinformatics.
- I had trouble finding the appropriate version of R. With the help of tutorials on YouTube, I was able to install the correct version.
- I followed the resources provided and became comfortable with coding.
- While importing one of the data sets, the code in the pdf wasn’t working for me. By re-reading and understanding each component, I was able to write an alternate code in order to complete the task.
- I successfully created and customized box and volcano plots using ggplot2 and EnhancedVolcano.
- I also went through the presentations on scientific reading and bioinformatics which gave me a deeper understanding of the project.
- Explored and downloaded dataset from GEO database.
- Imported and saved metadata and .CEL files to RStudio.
- Communication – I was able to clear all my doubts regarding the internship by contacting the project leads.
- Navigated and downloaded required data from GEO database.
- Installed and explored additional packages.
- Read and understood the Long et al. paper.
- I explored the GEO database and successfully downloaded all the required data.
- At first, I wasn’t able to understand all the information that was provided in the metadata text file. But after spending sufficient time on it, I managed to follow the information.
- I had trouble using the ReadAffy() function since I kept on getting an error. However, after watching Anya’s video, I was able to remove it and successfully import my data.
- In addition, I used GEO2R to visualize the datasets.
- I also watched Ali’s video which talked about GEO and gave an overview of the paper. It really helped me understand the results and methodology of the study.
- Performed QC analysis with the help of affyPLM, simpleaffy, arrayQualityMetrics, and affyQCreport.
- Carried out normalisation and background correction using RMA.
- Visualized data using boxplots, heatmap, and scatterplot (PCA).
- Communication- Reached out to several mentors and teammates for help.
- Time management
- Identified outliers with the help of QC reports.
- Understood the concept behind quality control of datasets.
- Created various plots of raw and normalized data.
- Because of my low RAM computer, generating a QC report using arrayQualityMetrics became a huge task for me.
- Colour coding my heatmap took a lot of time and troubleshooting.
- This module taught me the importance of paying attention to even the smallest of details while coding.
- I also learnt how to read and infer relevant information from the data.
- Removed outliers from data.
- Carried out gene annotation and gene filtration.
- Performed limma analysis and generated a topTable for the top 50 DEG.
- Visualised DEG using volcano plot and heatmap.
- Stack Overflow
- Time management
- Critical thinking
- Prepared a presentation on Module 4
- Successfully identified and made tables for significant differentially expressed genes.
- Graphed volcano plot showing up-regulated and down-regulated genes.
- Compared my results with that of the paper using GEO2R analysis tool.
- Initially, I wasn’t getting the desired results for gene annotation. But after seeking help from Anya and various online resources, I overcame the difficulty.
- I came across several errors while performing the tasks given in this module such as flipped plots, missing genes, and incorrect code. However, I successfully worked through all of them by spending some time on my code.
- In contrast to the previous week, I was able to color code and format my heatmap with ease in order to make it visually pleasing.
- Performed functional enrichment analysis using Gene Ontology and KEGG database and showed the results using barplots and dotplots.
- Visualized the gene-concept network of enriched up-regulated genes.
- Carried out GSEA analysis and plotted the results for key gene sets.
- Generated a gene-concept network of involved transcription factors.
- Prepared a text file containing gene symbols (sorted by logFC) for functional analysis using web-based tools.
- Time management
- Identified enriched pathways for up-regulated genes.
- Plotted numerous graphs for the results of GO, KEGG, and GSEA analysis.
- Compared my key findings with that of the original paper.
- I was able to write all the code in this module without much difficulty.
- My dotplot for KEGG analysis wasn’t coming out right but after changing the threshold value for fold change, I was able to rectify my mistake.
- I had to invest some time in order to understand and make valuable conclusions from the various graphs.
- Exported top DEGs for functional analysis using web-based tools.
- Explored and performed various analyses using EnrichR, Metascape, and GEPIA.
- Analytical - I compared and validated my previously generated results with the ones generated by web-based tools.
- Generated plots for BP and KEGG pathways using EnrichR.
- Conducted enrichment analysis of pathways, processes, and protein-protein interactions using Metascape.
- Used GEPIA to perform survival analysis for the COL10A1 gene.
- It took me a while to navigate and understand the various analyses that could be done using these external tools.
- I also spent some time analysing and comparing the results that were generated.
- Performed previously learned steps (Quality control, Normalisation, Statistical Analysis, and Functional Analysis) on a new dataset.
- Validated my findings with data mining using several scientific platforms.
- Google Scholar
- Problem Solving- Identifying and selecting comparable samples from the dataset took a lot of trial and error.
- Presentation skills- By giving a presentation on my capstone project to mentors and fellow team members, I was able to work on my speaking skills.
- Time management- Since this module consisted of applying skills from all the previous modules, it took quite a while to complete.
- Selected and imported a new dataset from GEO.
- Successfully carried out all the steps and identified differentially expressed genes.
- Conducted a literature review to support my outcomes.
- I selected dataset GSE22598 and used it to compare gene expression in cancer vs normal tissues of colorectal cancer.
- In order to remove outliers and reduce data anomalies, I carried out quality control and normalization. I visualized the same using PCA plots of raw and normalized data.
- Performed gene annotation, filtering, and limma analysis so that I could generate a top table containing my most up-regulated and down-regulated genes.
- With the help of heatmap, volcano plot, and publications, I cross-checked my top DEGs and identified ENC1 gene as a potential prognostic biomarker for colorectal cancer.
- Furthermore, I carried out overrepresentation analysis and GSEA so that I could identify and compare the various enriched pathways.
- During the entire process, I also got the opportunity to explore the pathogenesis and other associated genes of colorectal cancer.