Bioinformatics- Matthew Wang Level 1

Detailed Tasks

  • I installed and unzipped the GSE19804 dataset and metadata files.
  • I imported the resulting files into R.
  • I created a model matrix using the GSE 19804 dataset.
  • I read the paper for the GSE19804 dataset.
  • I conducted quality control for the GSE19804 dataset.

Technical Skills

  • PCA: I created a PCA graph for the first time.


  • Is it possible to view the results of normalized/background corrected data? If yes, then I’m trying to figure that out.

Soft Skills

  • Task management: Watching the recommended StatQuest video about completing a PCA graph made things more efficient.

Achievements and what I learned

I learned how to construct a scatterplot in R

I constructed a boxplot and heatmap representing the normalized and raw data.

I interpreted my outliers using the visuals I constructed.

I continued to conduct quality control using the affyPLM package


Using the plot function to cluster data in a PCA Graph.

Skills Learned

Quality control


Task management


Future Goals

Making a complete PCA Graph

Constructing DEGs

Achievements & Detailed List of Tasks

I revised my PCA graph representing the normalized and raw data

I installed the RTools software

Skills and What I Learned

R programming: I learned that without installing the RTools software, users will not be able to work with certain packages. So, when they install or load a package such as ggplot2, errors will result.

PCA: I learned that the easiest way to create clustered scatter plots is using ggplot, not the standard plot function in R. Using the plot function in R usually generates plots of any kind. The video explaining how to do module 3 was self-explanatory.

Future Goals

Revising a heatmap.

Creating DEGs.

Creating a Volcano Plot.

Achievements, Skills Learned, Detailed Tasks

Correlation- heatmap- I successfully created a heatmap representing the normalized and raw data. I needed to assign my rownames of all sample files to my data frame consisting the tissue type.

Research- I learned how to interpret a heatmap including its outliers. To identify the outliers in the PCA graphs, I glanced at both PCA graphs(raw data and normalised data) and identified which individual doesn’t seem to fit in the data cluster. I used the outliers in my heatmap and PCA graph to make a decision about the overall sampleset.


Explore New Methods:I’m still figuring out how to use the select method in R correctly. I wonder what this error message tells me.

Future Goals

Identify DEGs with limma analysis, gene annotation, and gene filtration.

Create a volcano plot representing the DEGs.

Achievements and what I learned.

I identified which probe IDs contain duplicate gene symbols.

I annotated my GSE 19804 dataset. I learned that a database won’t satisfy the keys, columns, and key types parameters. A string will satisfy the keytype and columns argument, and a vector of probeIDs will satisfy the keys argument.


Create a volcano plot

Identify DEGs with limma analysis, gene annotation, and gene filtration.

Where I’m currently at

Remove the duplicate gene symbols from my GSE 19804 dataset


I hope to finish Module 4 by Thursday. I hope to finish Module 5 by Friday.


I learned how the merge function in R works. I take two dataframes, specify the appropriate columns for measurement, and then combine the two dataframes. Then, the result will include all ProbeIDs that are included in both data frames with their symbols. I tested the quantile and rowMeans function, and I learned the rowMeans function will only take a matrix with numeric values.


I’m not sure if I loaded the normalized dataset into R correctly. I know through watching Youtube to load data, I have to save it into a file. Then, import it into R using the appropriate command. Data normalization was conducted in R so this seems a little different. I tried using the save and load command to achieve my goal, but this didn’t work.