Uyenle - Machine Learning (Level 3) Pathway

uyenle · July 5, 2021, 5:38pm

Module 1 and 2:

1. Overview of things learned:

Technical Area:
- Deeper understanding on bioinformatic and the process of research paper for later replication.
- Getting use to the concept of dependency parsing and dependency matrix
- Learning to retrieve data for replication
Tools:
- Git for team coding collaboration
- vscode as ILE
- pubmed parser for reading .xml file from Pubmed
- spacy for tokenization drug-gene name and relationships
- Stanford parser for dependency parsing
Soft Skills:
- Team work and communication
- Task management and assignment
- Sprint work scheme
- Journal club

2. Three achievement highlight

Implementing pubmed parser, spacy and Stanford parser correctly
Getting familiar with new environment vscode and Github
Planning and assigning tasks effectively among team to achieve desired results

3. Goals for upcoming week

Understanding and applying EBC algorithm
Learning Docker and AWS

4. Detailed statement of tasks done

Understanding and retrieving required databases for replication (except Drugbank due to account verification)
Setting up the working pipeline for the team to keep in track and in the same page
Screenshot 2021-07-05 at 19.37.241366×768 125 KB
Hurdles:

Different timezone: difficult for the whole team to set up group meeting => recording meeting and documenting for later review; dividing sub-team with closely timezone

Understanding new concepts in bioinformatic and NLP: all members have no prior knowledge on bioinformatic, also each member have own level in technical area, so that sometimes we’re not in the same page => learning new knowledge in Journal club style is more efficient and consuming; dividing tasks based on required skills so each member can choose what he/she want to take responsible for; reaching out for the mentors when unsure or need help

uyenle · July 28, 2021, 4:57am

Module 3:

1. Overview of things learned:

Technical Area:
- Learning and applying jython to run stanford-parser in Python language on Java platform.
- Learning and applying Dask to run big dataset parallely and independently without exhausting computer/laptop.
- Understanding EBC algorithm to solve the relationship between drug and gene, between pairs of drug-gene with same path; the results come from unsupervised step and supervised step; the EBC score function to evaluate.
Tools:
- jython for stanford-parser
- Dask
- EBC implement on Python
- Java VM
Soft Skills:
- Time management
- Task management and assignment

2. Three achievement highlight

Being able to run stanford-parser woth jython
Trying Dask to understand how it works
Running EBC to get the co-occurence matrix

3. Goals for upcoming week

Working on supervised step of EBC

uyenle · August 10, 2021, 5:37am

Module 4:

1. Overview of things learned:

Technical Area:
- Understanding the hierarchical Agglomerative Clustering to build-up the Dendrogram
- Casting Data and calculating minimax linkage hierarchical clustering given data correlation.
- Plotting the dendrogram and exploring more setting arguments to replicate the paper’s dendrogram
Tools:
- Rstudio: data.table, purr, protoclust, ape
Soft Skills:
- Time management
- Task management and assignment

2. Three achievement highlight

Being able plot dendrogram

**3. Upcoming Goals **

Interpreting the dendrogram as landscape of drug-gene dataset