Yeszoey - Machine Learning Pathway

  1. Overview of Things Learned

Technical:

  • Webscraping fundamentals
  • Data pre-processing Technics
  • ML models for NLP applications

Tools:

  • Webscraping tools such as BeautifulSoup and Selenium
  • Pandas, JSON
  • scikit-learn ML library
  • Github

Soft Skills:

  • Improve presentation skill
  • Better at searching resource online
  1. Achievement Highlights
  • Successfully web scraped from the Amazon and Flowster Discourse forums
  • Implemented the Logistic Regression algorithm to classify Discourse topics into the correct categories and got pretty good accuracy on Amazon dataset

List of Meetings/Training Attended:

  • All team meetings except one and github workshop

Goals for the Coming Week:

  • Continue to investigate methods to improve Logistic Regression classifier accuracy on Flowster dataset
  • Learn more about data augmentation

Detailed Statement of Tasks Done:

  • Used BeautifulSoup and Selenium to successfully scrape data from both Flowster and Amazon website and get 260 for Flowster and around 16k for Amazon.
  • Trained Logistic Regression with one-vs-rest method on Flowster dataset and got only 0.48 accuracy. So, I scraped the data from Amazon website in order to prove the reason for low accuracy was about the small size of dataset. Luckily, I got around 90% accuracy on Amazon dataset.

Thanks a lot to Sara and Rohit for being great mentors and leaders!

1 Like