BAISHALI_SOW_MONDAL - Machine Learning (Level 3) Pathway

Overview of things that I learned

Technical Area:

  • Learned more about overview concepts of Machine learning and some machine learning algorithms by watching the lecture videos
  • Learned more about fundamental ideas about NLP and some networks like Distributional Semantics,Linguistics problem,EBC etc.
  • Learning about web scraping and data mining


  • Natural Language Processing(NLP)
  • Bioinformatics
  • EBC
  • Medline

Soft Skills:

  • Being more prepared for machine learning and NLP as a whole
  • Get more familiar with how to explore Machine Learning in the field of Bioinformatics
  • Medline: learned the database of Medline and how to extract data from it.
  • EBC: Learned what Ensemble Biclustering for Classification (EBC) and hierarchical clustering algorithms

Achievements and tasks:

  • Learned about concepts of Machine Learning and web scraping
  • Built virtual environment
  • Became more familiar with machine learning,NLP ,Bioinformatics, Biomedical field
  • Read the research paper and made journal tasks based on that paper

Module 1 - Overview:

  • Technical skills:
    • Prsed the raw data from Medline by Pubmed parser
    • Understand how to use the Stanford Parser.
    • Read and understand given scientific papers
    • learned more foundational knowledge of Dependency Parsing
    • Used Dependency parser using java
  • Tools/Libraries:
    • Java: Downloaded and implemented it with parsing the .txt file.
    • VS Code: Installed it and tried to understand how to use it.
    • Stanford Dependency parser
    • Successfully installed jython2.7.2
  • Soft Skills:
    • Natural Language Processing(NLP)
    • Trying to understand Dependency Parsing
    • Get trying to more familiar with how to use VS Code and how to debug it for a python file
    • Have a basic understanding about the parsed database
    • Attend all the teamwork sessions and have a discussion about works.
    • Virtual-collaboration: Actively participated in training/Q&A sessions held by colin.

Achievement Highlights:

  • Learned how Dependency parsing works and what the foundational knowledge of Neural Transition Parser is.
  • Finished reading the Stanford Parser Manual to have a deep understanding of grammatical relationships between words and different format/style for the output.

Tasks Completed:

  • Completed Medline database parsing.
  • Get familiar with Big data analysis
  • Parsed databse using standord parser in java

Goals for The Upcoming Week:

  • Combine the output from the Pubmed parser to the Stanford parser and embed it with EBC.

Module-2 Overview:

  • Technical skills:
    • Build the sparse Dependency matrix using Stanford Parser
    • Used spaCy, a common NLP library in dependency parsing.
    • Learned more about Sparse Matrics in Machine Learning algorithm and how it can be used in Dependency parsing.
    • Understand how Stanford NLP works and how this can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech, and morphological features, and to give a syntactic structure dependency parse, which is designed to be parallel among more than 70 languages
  • Soft skills:
    • Stanford NLP
    • Dependency Parsing

Three Achievement Highlights:

  • Collaboration with other teams
  • Documenting progress

Goals for The Upcoming Week:

  • Extract the research papers which contains abstract and extract the drug-gene pairs with its dependency path

Tasks Done:

  • Sparse Matrix
  • Dependency parsing

Module-3 Overview:

  • Technical skills:

    • Filtered the Medline publications according to which ones contain abstracts using PubMed parser
    • Learned to use string matching to extract the sentences that contain drug-gene pairs to be input to the Stanford parser
    • Successfully extracted drug-gene pairs by using drug bank( for drug) and pharmGKB(for the gene)
    • Learned how to use the Stanford parser to .extract the dependency paths of the drug-gene pairs.
    • Biclustered the dependency matrix using the Ensemble Biclustering Algorithm.
    • Successfully constructed a graph using an arbitrary number of data files in Dask
  • Soft skills:

    • Biclustering
    • EBC
    • Dask

Three Achievement Highlights:

  • Collaboration with other teams
  • Documenting progress

Goals for The Upcoming Week:

  • Compute a final set of clusters and visualize via dendrograms

Tasks Done:

  • Data filtering
  • String matching
  • dependency path extraction
  • Biclustered the dependency matrix
  • Constructed a graph