Mandar Mhaske - Machine Learning Self Assessment

Week of 7/22/2020

Reviewing discussion forums of CodeChef and making a report on it.

Everyone collaborated to create the report on CodeChef and discuss it’s viability as a suitable data source for our project.

Things learned:
Slack
Asana

All meetings attended.

Week of 7/27/2020

Scraping data from Codecademy discussion forums.

Used BeautifulSoup and Selenium to scrape data from Codecademy forums.
Collaborated with the other members to improve the scraping algorithm and to discuss appropriate data to scrape by conducting meeting of our own.

Things Learned:
BeautifulSoup and Selenium for webpage scraping
Online collaboration

2 Group Meetings Attended
2 CodeChef group meetings attended

Week of 8/3/2020

Data Cleaning

Performed preprocessing and cleaning to data to apply training models to it.

Things learned:
Data Cleaning techniques-

  • Removing punctuations and newlines
  • Tokenization
  • Removing stop words
  • Lemmatization
  • Stemming

All meetings attended

Week of 8/10/2020

Studying the training model to be used for training the dataset. I studied simpletransformers. Our group created a powerpoint presentation on simpletransformers.

Things learned:
How to use simpletransformers
Different uses of simpletransformers

Could not attend the meeting on 8/14/2020 because I was moving apartments.

Performed final implementation of topic recommendation system using BERT and DistilBERT. Calculated performance metrics by using the methods Logistic Regression, OneVsRest Classification, Random Forest Classification.

Had the final presentation of the project on 4 September 2020.