Dravie - Bioinformatics (Level 1) Pathway

Dravie · July 13, 2021, 12:20pm

Module 1 (REPOST)

Technical Area:

Installed R, RStudio, and other packages.
Became familiar with coding, debugging, and data visualization.
Read scientific papers in an efficient yet comprehensive manner.

Tools:

R
Rstudio
Stem-Away platform
Youtube

Soft Skills:

Time management- Learned how to manage multiple projects at the same time.
Troubleshooting- Got an opportunity to develop my problem-solving skills by working through the error-filled R script.
Communication- Interacted with people from various backgrounds which helped develop my speaking skills.

Achievement highlights:

Managed to install and code in RStudio and its various packages.
Successfully read and summarized complex scientific papers.
Became well-versed with concepts in bioinformatics.

Tasks/Hurdles:

I had trouble finding the appropriate version of R. With the help of tutorials on YouTube, I was able to install the correct version.
I followed the resources provided and became comfortable with coding.
While importing one of the data sets, the code in the pdf wasn’t working for me. By re-reading and understanding each component, I was able to write an alternate code in order to complete the task.
I successfully created and customized box and volcano plots using ggplot2 and EnhancedVolcano.
I also went through the presentations on scientific reading and bioinformatics which gave me a deeper understanding of the project.

Dravie · July 13, 2021, 12:27pm

Module 2

Technical Area:

Explored and downloaded dataset from GEO database.
Imported and saved metadata and .CEL files to RStudio.

Tools:

RStudio
Youtube
Stem-Away
GEO
MSExcel

Soft Skills:

Communication – I was able to clear all my doubts regarding the internship by contacting the project leads.
Troubleshooting

Achievement Highlights:

Navigated and downloaded required data from GEO database.
Installed and explored additional packages.
Read and understood the Long et al. paper.

Tasks/Hurdles:

I explored the GEO database and successfully downloaded all the required data.
At first, I wasn’t able to understand all the information that was provided in the metadata text file. But after spending sufficient time on it, I managed to follow the information.
I had trouble using the ReadAffy() function since I kept on getting an error. However, after watching Anya’s video, I was able to remove it and successfully import my data.
In addition, I used GEO2R to visualize the datasets.
I also watched Ali’s video which talked about GEO and gave an overview of the paper. It really helped me understand the results and methodology of the study.

Dravie · July 20, 2021, 2:14pm

Module 3

Technical Area:

Performed QC analysis with the help of affyPLM, simpleaffy, arrayQualityMetrics, and affyQCreport.
Carried out normalisation and background correction using RMA.
Visualized data using boxplots, heatmap, and scatterplot (PCA).

Tools:

Rstudio
RCloud
Github
Youtube

Soft Skills:

Communication- Reached out to several mentors and teammates for help.
Teamwork
Time management
Patience

Achievement highlights:

Identified outliers with the help of QC reports.
Understood the concept behind quality control of datasets.
Created various plots of raw and normalized data.

Tasks/Hurdles:

Because of my low RAM computer, generating a QC report using arrayQualityMetrics became a huge task for me.
Colour coding my heatmap took a lot of time and troubleshooting.
This module taught me the importance of paying attention to even the smallest of details while coding.
I also learnt how to read and infer relevant information from the data.

Dravie · July 28, 2021, 7:50pm

Module 4

Technical Area:

Removed outliers from data.
Carried out gene annotation and gene filtration.
Performed limma analysis and generated a topTable for the top 50 DEG.
Visualised DEG using volcano plot and heatmap.

Tools:

RStudio
pheatmap
EnhancedVolcano
hgu133plus2.db
limma
Stack Overflow
GEO2R

Soft skills:

Time management
Troubleshooting
Critical thinking
Prepared a presentation on Module 4

Achievement highlights:

Successfully identified and made tables for significant differentially expressed genes.
Graphed volcano plot showing up-regulated and down-regulated genes.
Compared my results with that of the paper using GEO2R analysis tool.

Tasks/Hurdles:

Initially, I wasn’t getting the desired results for gene annotation. But after seeking help from Anya and various online resources, I overcame the difficulty.
I came across several errors while performing the tasks given in this module such as flipped plots, missing genes, and incorrect code. However, I successfully worked through all of them by spending some time on my code.
In contrast to the previous week, I was able to color code and format my heatmap with ease in order to make it visually pleasing.

Dravie · August 6, 2021, 7:37pm

Module 5

Technical Area:

Performed functional enrichment analysis using Gene Ontology and KEGG database and showed the results using barplots and dotplots.
Visualized the gene-concept network of enriched up-regulated genes.
Carried out GSEA analysis and plotted the results for key gene sets.
Generated a gene-concept network of involved transcription factors.
Prepared a text file containing gene symbols (sorted by logFC) for functional analysis using web-based tools.

Tools:

RStudio
org.Hs.eg.db
clusterProfiler
enrichplot
msigdb

Soft skills:

Time management
Communication

Achievement highlights:

Identified enriched pathways for up-regulated genes.
Plotted numerous graphs for the results of GO, KEGG, and GSEA analysis.
Compared my key findings with that of the original paper.

Tasks/Hurdles:

I was able to write all the code in this module without much difficulty.
My dotplot for KEGG analysis wasn’t coming out right but after changing the threshold value for fold change, I was able to rectify my mistake.
I had to invest some time in order to understand and make valuable conclusions from the various graphs.

Dravie · August 28, 2021, 5:37pm

Module 6

Technical Area:

Exported top DEGs for functional analysis using web-based tools.
Explored and performed various analyses using EnrichR, Metascape, and GEPIA.

Tools:

RStudio
EnrichR
Metascape
GEPIA
GitHub

Soft Skills:

Analytical - I compared and validated my previously generated results with the ones generated by web-based tools.
Troubleshooting

Achievement highlights:

Generated plots for BP and KEGG pathways using EnrichR.
Conducted enrichment analysis of pathways, processes, and protein-protein interactions using Metascape.
Used GEPIA to perform survival analysis for the COL10A1 gene.

Tasks/Hurdles:

It took me a while to navigate and understand the various analyses that could be done using these external tools.
I also spent some time analysing and comparing the results that were generated.

Dravie · August 28, 2021, 6:15pm

Module 7

Technical Area:

Performed previously learned steps (Quality control, Normalisation, Statistical Analysis, and Functional Analysis) on a new dataset.
Validated my findings with data mining using several scientific platforms.

Tools:

RStudio
GitHub
PubChem
Google Scholar
GEO

Soft Skills:

Problem Solving- Identifying and selecting comparable samples from the dataset took a lot of trial and error.
Presentation skills- By giving a presentation on my capstone project to mentors and fellow team members, I was able to work on my speaking skills.
Time management- Since this module consisted of applying skills from all the previous modules, it took quite a while to complete.

Achievement highlights:

Selected and imported a new dataset from GEO.
Successfully carried out all the steps and identified differentially expressed genes.
Conducted a literature review to support my outcomes.

Tasks/Hurdles:

I selected dataset GSE22598 and used it to compare gene expression in cancer vs normal tissues of colorectal cancer.
In order to remove outliers and reduce data anomalies, I carried out quality control and normalization. I visualized the same using PCA plots of raw and normalized data.
Performed gene annotation, filtering, and limma analysis so that I could generate a top table containing my most up-regulated and down-regulated genes.
With the help of heatmap, volcano plot, and publications, I cross-checked my top DEGs and identified ENC1 gene as a potential prognostic biomarker for colorectal cancer.
Furthermore, I carried out overrepresentation analysis and GSEA so that I could identify and compare the various enriched pathways.
During the entire process, I also got the opportunity to explore the pathogenesis and other associated genes of colorectal cancer.