Build the recommender part and train basic machine learning models to perform posts classification into a certain category from a certain forum.
Now that you have scraped, visualized, and cleaned the data you needed. It is time to train your simple machine learning models and build a basic content-based recommender system.
To do this we will need to :
- Make sure you pick only important columns. Example:
|Topic Title||Category||Tags||Leading Post||Post Replies||Created_at||Replies|
- Clean your data well.
- Transform your textual data into meaningful word vectors or word embeddings (check out Module 1 NLP webinars to understand more on this). Example: Bag of words or TF-IDF
For the basic recommender system:
- Calculate a distance metric. Example: Cosine similarity.
- Recommend a post using the title of a previously liked post (go for a top 10 recommendation).
For the simple classification model
- Identify 5 simple machine learning classification models and train them on your data
- Benchmark these models by calculating metrics like:
Accuracy, Precision, Recall, and F1-Score
- Pick the best performing one and perform hyperparameter tuning for it OR change the way you generate your word vectors or embeddings, etc.
- Test your model by feeding your input data and evaluate its output and see if it meets your expectations.
- Be as proactive as you can, check out why your model or similarity results are performing well or badly.
- Investigate things like class imbalance, dropping off some columns, adding other columns, and check out ensemble methods and see if they improve your accuracy.
- Basic machine learning models training and some data visualization and cleaning:
dataCleaning_Exploration_basicModeling.htm (1.0 MB)
- Ensemble ML models training and some other embeddings methods:
Ensemble_AdvancedEmbeddings_Methods.html (1.6 MB)
- Check out the linked Movie recommender notebook from Intro to Content-Based recommendation systems - Recommendation Models by Sara EL-ATEIF