Advanced Topics for the Bioinformatics Pathway

This thread will be a growing list of advanced topics which may add some bioinformatic depth to your project! If you would like Ali or I to host a round table discussion in-depth on one of this topics, give it a like!

(Also, since I’m the one posting these topics and I don’t know much about Ali’s work, some descriptions may seem lacking, so I apologize. If you want more details about any topic, let me know and I can get more information from Ali!)

Please feel free to propose your own topics!

If you’re interested in just exploring possible avenues, check out The Broad Institute! They’re a research institution located in Boston and they have a ton of resources for learning about cutting edge methods. They also have developed many common bioinformatics tools and host many useful public datasets like GATK and IGV.

Here’s a link to their YouTube where they have recordings of seminars and some tutorials on novel research discoveries and methods.

Fun Fact: (Can you tell I really like the Broad?) The founding director and former president of the Broad, Eric Lander, is now the Director of the White House Office of Science and Technology Policy!

The following link is another good resource to learn more about Next Generation Sequencing analysis. This webpage contains a bunch of workshop materials and tutorials created by the Harvard Chan School of Public Health.

1 Like

Integration of Datasets and Batch Adjustment

Some of you may have already done this in the pathway hub with 2 datasets, but typically researchers look at much more than that! In Ali’s PhD thesis, he looked at hundreds!

Multivariable Models

Oftentimes in research, age, sex, ethnicity, disease stage/severity, and many other variables must be considered to obtain accurate results. Ali can lead a discussion on how to handle and include all these different factors in a statistical model.

Multivariable Models Extended

Depending on interest, I can also provide insight on the caveats and common pitfalls of including age, sex, and ethnicity in these models. The inclusion of these variables in models and equations is a topic of hot debate in disease research as researchers are becoming more aware of the biases accumulated through past preference for caucasian study participants and the systematic exclusion of certain demographics - African Americans, Native Americans, and other minorities; women; trans folk and members of the LGBT+ community; etc.

One example from my research area is the inclusion of race in the calculation for eGFR (estimated glomerular filtration rate). A lower eGFR indicates worsening kidney function. The diagnosis of kidney disease is dependent on this eGFR equation which includes a coefficient determined by whether the patient is African American or not. If a patient is AA, the calculation of eGFR is systematically higher (and therefore the actual kidney function needs to be worse in order to be classified as kidney disease). This means that doctors who use this equation blindly are more likely to lengthen the amount of time AA patients are untreated.

Transcript Level DGE

You’re currently analyzing gene-level data. The next level of analysis would be transcript-level data.

Variants, single cell analysis, and expression quantitative trait loci (Maybe)

My current research focuses on variant-level analysis at single cell resolution which looks at the actual sequences of the genome (per single cell) and the variations within. I’m currently working on developing novel methods for comparative analysis of orthogonal datasets at the gene, variant, and trait level.

I can host a discussion about this multiomics integrative approach. However, it would be hard to implement this topic in your projects given the 5-8 week timeframe (it’s taken me 6+ months to get to the integration part and I’m still trying to figure it out :sweat_smile:). Nevertheless, it’s a really fascinating problem and I’d be happy to introduce you to these different levels of analysis!