Akansha - Machine Learning (Level 3) Pathway

Module 1

Overview of things I learned

Technical Area

  • Reviewed web scraping with the help of requests library
  • Familiarised myself with some of the basic concepts of bioinformatics by going through the prerequisites.
  • Learned more about VSM’s and distributional semantics.

Tools

  • Jupyter Notebook
  • BeautifulSoup
  • STEM-Away forum

Soft skills

  • Learned how to communicate effectively with internationally diverse teams
  • Learned effective time management

Three Achievement Highlights

  • Scraped and parsed the raw data from the medline website.
  • Made my way through the project paper and got an overall better understanding of the project.
  • Made my way through the prerequisites and got a basic understanding of biomedical relationship.

Goals for the Upcoming weeks

Preprocessing the raw data and making the dependency matrix

Tasks completed

  • Scraped the raw data from the medline database and parsed it using the pubmed parser.

Module 2

Overview of things I learned

Technical area

  • Reviewed the use of NLTK library for text processing
  • Familiarised myself with the concept of dependency parsing.
  • Familiarised myself with terms used in dependency relationships

Tools

  • Google Colab
  • Stanford parser
  • Jython

Soft skills

  • Learned how to collaborate with people in different timezones
  • Learned how to effectively go through a research paper to get the most information.

Three Achievement Highlights

  • Performed Data cleaning on medline abstracts to get the relevant sentences
  • Went through the STEM-AWAY resources to get a better understanding of dependency parsing and how it works
  • Successfully set up the Stanford parser and parsed an example sentence

Goals for the upcoming weeks

parsing the obtained medline sentences to get the dependency paths and making the dependency matrix

Tasks completed

  • Processed the raw abstracts by converting them into sentences.
  • Removed stop words and converted the sentence to tokens using NLTK
  • Filtered the abstracts by searching for a drug gene pair in the sentence.
  • Saved the relevant sentences into CSV files.
  • Setup the Stanford parser using the jython interface and successfully parsed an example sentence.

Module 2

Overview of things I learned

Technical Area

  • Familiarised myself with the Stanford parser
  • Reviewed the use of graphs and how to find the shortest path between two nodes
  • Familiarised myself with the nltk.parse library

Tools

  • Jupyter Notebook
  • Stanford parser
  • nltk.parse

Soft skills

  • Learned how to effectively sharey findings with the rest of the team
  • Learned how to effectively debug

Three Achievement Highlights

  • Successfully set up the Stanford parser in python
  • Successful parsed the cleaned medline sentences to get the dependency relationship
  • Successfully created the dependency matrix

Goals for the upcoming weeks

Start working on the EBC algorithm and get the co-occurence matrix

Tasks completed

  • Set up Stanford parser in python with the help of the resources provided
  • went through the nltk.parse library to get information that I need to implement the parser
  • Made a networkkx graph and got the shortest path between drug and gene pair
  • Made the dependency matrix

Module 3

Overview of things I learned

Technical Area

  • Familiarised myself with the concept of biclustering and how it works
  • Familiarised myself with the ITCC algorithm by going through the resources
  • Familiarised myself with the EBC repository and how to use it to get the clusters

Tools

  • Google Colab
  • EBC
  • GitHub

Soft skills

  • Learned how to do effective research to understand a particular concept

Three Achievement Highlights

  • Went through the Stem-Away resources to get a better understanding of EBC algorithm
  • Successfully set up the EBC by converting the python 2.0 code to its equivalent in python 3
  • Successfully ran the EBC algorithm on the dependency matrix and got the co-occurence matrix

Goals for upcoming weeks

Complete the supervised portion of EBC

Tasks completed

  • set up the EBC algorithm so that it can be used on the dependency matrix
  • Ran the EBC algorithm a 1000 times on the dependency matrix to get the co-clustering counts
  • Built the co-occurence matrix by seeing how often a drug gene pair [i] co-cluster with drug gene pair [j]

Module 3

Overview of things I learned

Technical Area

  • Familiarised myself with the DrugBank database
  • Familiarised myself with how EBC scoring works and the math behind it.
  • Reviewed the Stem-Away recording to get a better understanding on how to perform the scoring

Tools

  • Google Colab
  • EBC
  • DrugBank

Soft skills

  • Learned better time management skills

Three Achievement Highlights

  • Went through the original paper to get a better understanding of how EBC scoring works
  • Successfully created the test sets and the seed sets
  • Successfully extracted the EBC scores for different seed set sizes

Goals for the upcoming weeks

Work on the data visualization and make the dendrogram.

Tasks completed

  • Extracted the ground truth drug gene relationship from DrugBank
  • Cleaned the DrugBank data to only get the relevant information
  • Created the test set of size 100 by taking 50 pairs from DrugBank and 50 from our matrix
  • created seed sets of varying sizes by only taking pairs from DrugBank and keeping both test set and seed set mutually exclusive
  • Performed EBC scoring on the test set and successfully extracted the scores

Module 4

Overview of things I learned

Technical Area

  • Familiarised myself with R studio
  • Familiarised myself with the concept of Hierarchical clustering
  • Familiarised myself with R syntax and how to plot a Dendrogram

Tools

  • R studio

Soft skills

  • Learned how to do effective research to learn how to get the results I want

Three Achievement Highlights

  • went through the resources to get started on building a Dendrogram
  • Did online research for getting a better understanding on how Dendrograms are built and how to extend it
  • Successfully built a final dendrogram

Goals for the upcoming weeks

Work on the final presentation

Tasks completed

  • setup R and R studio
  • Looked through R syntax to get an understanding on how to do what I want to do
  • Created a fan dendrogram with the help of Stem-Away resources
  • went through the paper to see how they extended the dendrogram in the final product
  • used the Willeerd library to get the tip markers
  • Made the final dendrogram