Progress Summary - Alzheimer’s Disease Analysis Project - Daniel Drucker
- The data in this set was downloaded in a different format than the cancerous sample we performed our earlier analyses on. And so normalization had to be done by different means. Moreover, justifying how the samples were distinguished by “batch” during the batch correction stage was very confusing because it’s not apparent there was a convenient way to label each sample by batch from the data’s series matrix file. Immediately I thought these batches were supposed to be the Braak stage of the samples, but this doesn’t actually make sense–using the Braak stage to label batches had the effect of removing distinction based on the Braak stage in my PCA plots, when exactly the opposite is desired. Ultimately I have an output for data that is apparently grouped appropriately by Braak stage in my pca plots, but I have a difficult time justifying how the process worked to myself.
- I had to keep track of two different sets of differentials through the differential gene expression step, one for the Braak-3 and Braak-6 samples each compared against the healthy sample. This didn’t actually cause me any errors I couldn’t debug independently, nor do I think it necessarily added a great deal of extra complexity, but it was certainly more onerous.
- General uncertainty about understanding the gene concept network. I wound up not including this portion in my presentation simply because I didn’t believe I’d be capable of explaining or justifying the work I did on it.
- In the gene ontology enrichment step, I found that the gene symbols that came from the data set were actually not standardized so that the enrichment functions could read them. Ultimately, I very tediously renamed the appropriate entries in my data table after researching the non-standard aliases used for the most significant differential expression values, which were those given to the enrichment functions to produce functional analysis plots
Summary of Work
- Normalized data from the set
- Corrected batch effects, this time grouping the data an additional time, based on the distinction between the healthy sample, Braak stage 3 sample, and Braak stage 6 sample
- Calculated differential gene expressions using lmFit and the statistical significance information for each value using eBayes
- Gene ontology enrichment to identify and visualize the cellular functions affected based on the genes with the highest magnitude differential expression values
- Transcriptional factor analysis to create functional associations between gene functions
- All the above was done for both the Braak-3 and Braak-6 stage samples
- I feel as though I certainly could’ve done more in this project for the functional analysis step. I find my lack of intuition regarding the network plots to visualize gene function associations especially frustrating. With a stronger biological background, I would have loved to understand the cellular functions affected by Alzheimer’s, because some mystery surrounds neural diseases for me.
- I intend to continue independently working on these frustrations and add to my work after the end of this program.