Machine Learning - Level One Module Three - Shreya
Concise Overview Of Things Learned
- Transformed textual data into a meaningful word vector (bag of words)
- Calculated a distance metric (cosine similarity)
- Recommended a post using the title of a previously liked post (top 5 recommended posts)
- Identified simple machine learning classification models and trained them on my data
- Benchmarked these models by calculating different metrics
- Picked the best performing one and tested the model by feeding input data and evaluating its output
- Google Colaboratory
- pandas, matplotlib, seaborn, xgboost, etc.
- VS Code
- Knowing when to ask for help and being able to learn from others
- Effective problem solving and persistence
- Attention to detail
- Made a basic recommender system which recommends top 5 most similar posts
- Trained multiple simple classification models by following our mentor’s guide
- Tested the best performing model and plotted the results
Detailed Statement of Tasks Completed
- Became more familiar with Google Collaboratory.
- Converted cleaned data into bag of words, then calculated cosine similarity matrix.
- Used index of inputted post to sort cosine similarity scores in descending order so that the highest similarity (1) scores were at the top, with index 0 being the inputted post itself. Then returned the titles of the top five most similar posts (indexes 1:6).
- One issue that came up was that the recommender would end up recommending other pages of the same post (i.e. page 3 of a large, multipage post). So I had to go back and clean my data further by identifying and removing extra pages of a post.
- Learned about and then trained multiple machine learning models (ex. Logistic regression, naive bayes, etc.) on my data.
- Evaluated the models by calculating metrics such as accuracy, precision, recall, and f1-score.
- Chose the best model, then tested it and evaluated the plotted actual vs. predicted results.
To Be Continued
- Generate word vectors differently to see if that improves accuracy
- Investigate possible class imbalance