Uyenle - Machine Learning (Level 3) Pathway

Module 1 and 2:

1. Overview of things learned:

  • Technical Area:
    • Deeper understanding on bioinformatic and the process of research paper for later replication.
    • Getting use to the concept of dependency parsing and dependency matrix
    • Learning to retrieve data for replication
  • Tools:
    • Git for team coding collaboration
    • vscode as ILE
    • pubmed parser for reading .xml file from Pubmed
    • spacy for tokenization drug-gene name and relationships
    • Stanford parser for dependency parsing
  • Soft Skills:
    • Team work and communication
    • Task management and assignment
    • Sprint work scheme
    • Journal club

2. Three achievement highlight

  • Implementing pubmed parser, spacy and Stanford parser correctly
  • Getting familiar with new environment vscode and Github
  • Planning and assigning tasks effectively among team to achieve desired results

3. Goals for upcoming week

  • Understanding and applying EBC algorithm
  • Learning Docker and AWS

4. Detailed statement of tasks done

Different timezone: difficult for the whole team to set up group meeting => recording meeting and documenting for later review; dividing sub-team with closely timezone

Understanding new concepts in bioinformatic and NLP: all members have no prior knowledge on bioinformatic, also each member have own level in technical area, so that sometimes we’re not in the same page => learning new knowledge in Journal club style is more efficient and consuming; dividing tasks based on required skills so each member can choose what he/she want to take responsible for; reaching out for the mentors when unsure or need help

1 Like

Module 3:

1. Overview of things learned:

  • Technical Area:

    • Learning and applying jython to run stanford-parser in Python language on Java platform.
    • Learning and applying Dask to run big dataset parallely and independently without exhausting computer/laptop.
    • Understanding EBC algorithm to solve the relationship between drug and gene, between pairs of drug-gene with same path; the results come from unsupervised step and supervised step; the EBC score function to evaluate.
  • Tools:

    • jython for stanford-parser
    • Dask
    • EBC implement on Python
    • Java VM
  • Soft Skills:

    • Time management
    • Task management and assignment

2. Three achievement highlight

  • Being able to run stanford-parser woth jython
  • Trying Dask to understand how it works
  • Running EBC to get the co-occurence matrix

3. Goals for upcoming week

  • Working on supervised step of EBC

Module 4:

1. Overview of things learned:

  • Technical Area:

    • Understanding the hierarchical Agglomerative Clustering to build-up the Dendrogram
    • Casting Data and calculating minimax linkage hierarchical clustering given data correlation.
    • Plotting the dendrogram and exploring more setting arguments to replicate the paper’s dendrogram
  • Tools:

    • Rstudio: data.table, purr, protoclust, ape
  • Soft Skills:

    • Time management
    • Task management and assignment

2. Three achievement highlight

  • Being able plot dendrogram cluster_dendrogram dendrogram

**3. Upcoming Goals **

  • Interpreting the dendrogram as landscape of drug-gene dataset