- Python foundations: coding, packages, debugging
- Machine learning foundations: defining the problem, collecting data, training and evaluating the ML models, deploying the solution
- Introduction to recommender systems
What we are trying to build is a forum post recommender system, that will help us recommend similar posts to a certain post or in a certain category.
How are we going to do this?
- Collect the data: We will pick the DiscourseHub Community forums as our main data resource. You can either scrap one or several forums.
- Explore your data (EDA): After gathering our data, we will analyze it to understand the data and get familiar with it before feeding it to ML models.
- Calculate similarity between posts and recommend: We will vectorize our data, and compute the similarity matrix then use it to recommend posts similar to a certain post.
- Train ML classifiers and classify a post into its appropriate category.
- Compare and choose the best approach: evaluate the results of step 3 and 4 and choose the best one.
- Build a simple web app using Flask or Streamlit.
- Deploy your web app using AWS or Heroku or some other service (if we have enough time).
- Getting familiar with Colab or Jupyter notebooks (for people planning to work on their local environment) and some basic knowledge of python.
- Install/Check if you have these lower-level technical requirements:
- Beautiful Soup (python library for parsing HTML text),
- Selenium (python library used for dynamically interacting with webpages)
- Webdriver (works with selenium to drive browser of choice, preferably Chrome or Firefox)
- Editor/IDE on the local machine (preferably VS Code) OR Jupyter Notebooks (from Anaconda preferably)
Nice to have (especially if you aren’t planning to use Colab):
- Beautiful Soup ((python library for parsing HTML text)
- Spacy (python library for general NLP)
- Sentence_transformers (python library for creating sentence embeddings based on BERT encoding)
- Transformers (python library for working with transformer-based architectures like BERT)
- Access to a GPU or use Colab (preferably 16GB memory)
- Deep Learning Library (preferably Pytorch or TensorFlow)
- Go over the instructions to prepare your workspace
- Watch the webinars in the resources to get familiar with the concepts we will need (at least the first four)
- Play around with scraping until you are familiar with its concepts, use the DiscourseHub communities forums as a playground
- Get familiar with what ML is and what it is a typical ML project workflow
- Read about what NLP is, how it is used nowadays, and how we can use it to build our project
- Train a basic ML model like logistic regression to classify a textual input into a negative or positive sentiment
- Write your self-assessment
- Everyone has their own way of learning, some of us read books, others watch videos or play around with source codes. Find the best way you learn and practice until you feel comfortable enough to tackle the tasks to build the project.
- When encountering an error or have some kind of technical question, google it first then if you still didn’t get the answer you were searching for, ask in the forums politely and try to be as precise as you can.
- I am here to help so tag me if you need me for anything I will try to help as much as I can.
- Don’t worry if you don’t understand how to use some ml packages, or have no experience in deploying ML models, yet. We’ll be going more in-depth with these things in the upcoming modules.
Understand recommender systems: Intro to Content-Based recommendation systems - Recommendation Models by Sara EL-ATEIF
Learn web scraping: Intro to Web Crawling/Data Scraping - Data Mining by Maleeha Imran
Get familiar with (resources shared by Colin our industry mentor):
Industry Mentor Webinars: NLP Basics series
Great NLP Resources.pdf (44.8 KB)
PS: Let me know if you need any clarifications or you don’t understand something.