Machine Learning - Level 1 Week 5 Self Assessment
Technical Areas worked on :
1, Trained different combinations of word embeddings and classifiers on the data to figure out the best performing one :
(As perfomed on the csv file selected as the team csv) :
a. TF-IDF + Naive Bayes
b. TF-IDF + SVM
c. Word2Vec + Naive Bayes
d. Word2Vec + SVM
e. TF-IDF + Logistic Regression
f. TF-IDF + SVM (along with hyperparameter tuning)
g. TF-IDF + Random Forest (along with hyperparameter tuning)
h. TF-IDF + Bagging Trees (along with hyperparameter tuning)
g. TF-IDF + XGBoost (along with hyperparameter tuning)
h. Additionally, ran Roberta on the data personally scraped.
The combination of TF-IDF and SVM along with hyperparameter tuning yielded the best results
- Gleaned a better insight on BERT and BERT family of classifiers (RoBERTa, XLNet, DistilBERT) and and its architecture via several articles.
- Jupyter Notebook
- Gridsearch CV
Soft Skills :
- Met the co-leads multiple times over the week and finalized on the : a. Problem Statement for the Recommender System b. Sub-teams for classifiers and recommendation system to improve team efficiency. c. One common, most comprehensive csv file for the whole team to work on.
- Explained week deliverables during the team meeting, stressed on the importance of amping up inter-team communication and made a comprehensive document of deliverables to be completed for the week (along with resources)
- Attended office hours with Anubhav to clarify doubts with respect to the ML pipeline and the problem statement.
Three achievement highlights
- Read about and understood the non-directional nature of BERT classifier and deeper insights into its architecture.
- Ran a bevy of classification models and learnt how to implement GridSearch CV for hyperparameter tuning.
- Clarified team members’ questions / doubts via Discord and Scrum.
Goals for the week :
- Explore more classification models/ensemble models and pipelines for the data. Also focus on making the accuracy better.
- Explore deep learning frameworks/neural networks for classification of the data
- Run more sophisticated word embeddings (such as TF-IDF + BERT)
- In tandem, start working on building the recommender system.
- Run confusion matrices after having trained the data to gain an overall picture of model performance (and not just accuracy in silos)
- Trace team progress and work towards integrating everyone’s contributions into one complete team deliverable.
- Some models proved to be very computationally expensive (BERT, XGBoost). Figure out some work-arounds for that.
- Towards the end of the week, focus on how to deploy the web app for our deliverable.
(Snapshot of TF-IDF + SVM hyperparameter tuning ~ yielded an accuracy of 96%)