Hosted team meetings & added in weekly games event
Team updates & To Do List
Github overview
Set modules due dates
Module 2 web scraping and EDA tutorial overview
Attending mentor meetings hosted by Sara and Anubhav
Achievements:
Understood how data can be scraped from a website and storing it into a csv file.
Learned how to do data cleaning and data visualization.
Provided detailed team structure and resource outline to team members
Understanding machine learning better
Goals for the upcoming week:
Module 3 + referring back to Module 1 and 2 resources & notes I had taken
Checking daily team updates channel and Jira for team progress.
A more in depth look into basic recommenders and classifiers
Tasks Completed:
Pushed testing file to our team GitHub repo
Familiarize myself with the PyTorch Community
Properly installing and setting path for geckodriver on Windows.
Scraped data from PyTorch Community. Problem faced: there is a huge data set on Pytorch
(around 45,000+ posts). It took very long to scrape and faced timing out issues if the
next post does not load within 5 minutes. To solve this, I decided to scrape by categories.
I did 5 runs (around 1,000-2,000 post per run) and then combining the 5 csv files into one
csv file and editing it.
Performed data cleaning and EDA on the data gathered.
Trained collected data with Naive Bayes, Linear SVM, Logistic Regression, Decision Tree,
Random Forest, XGBoost, Light GBM and analyze their performances.
Trained Doc2Vec and TF-IDF with (Logistic Regression, Random Forest, XGBoost). Best performance
by TF-IDF and XGBoost.
Reviewed recorded final presentations by session 1 teams and have a better understanding of the project
Communicated with other leads regarding project direction
Goals for the upcoming week:
Team Presentation to mentors
Module 4 + referring back to Module 1, 2, and 3 resources & notes I had taken
A better look at cosine similarities + BERT + development of web app