Level 1: Module 1 - Self Assessment and Preparation

Project Overview

Requirements

  • Python foundations: coding, packages, debugging
  • Machine learning foundations: defining the problem, collecting data, training and evaluating the ML models, deploying the solution
  • Introduction to recommender systems

Problem statement

What we are trying to build is a forum post recommender system, that will help us recommend similar posts to a certain post or in a certain category.

How are we going to do this?

  1. Collect the data: We will pick the DiscourseHub Community forums as our main data resource. You can either scrap one or several forums.
  2. Explore your data (EDA): After gathering our data, we will analyze it to understand the data and get familiar with it before feeding it to ML models.
  3. Calculate similarity between posts and recommend: We will vectorize our data, and compute the similarity matrix then use it to recommend posts similar to a certain post.
  4. Train ML classifiers and classify a post into its appropriate category.
  5. Compare and choose the best approach: evaluate the results of step 3 and 4 and choose the best one.
  6. Build a simple web app using Flask or Streamlit.
  7. Deploy your web app using AWS or Heroku or some other service (if we have enough time).

Instructions

  1. Getting familiar with Colab or Jupyter notebooks (for people planning to work on their local environment) and some basic knowledge of python.
  2. Install/Check if you have these lower-level technical requirements:
  • Minimum:

    1. Beautiful Soup (python library for parsing HTML text),
    2. Selenium (python library used for dynamically interacting with webpages)
    3. Webdriver (works with selenium to drive browser of choice, preferably Chrome or Firefox)
    4. Editor/IDE on the local machine (preferably VS Code) OR Jupyter Notebooks (from Anaconda preferably)
  • Nice to have (especially if you aren’t planning to use Colab):

    1. Beautiful Soup ((python library for parsing HTML text)
    2. Spacy (python library for general NLP)
    3. Sentence_transformers (python library for creating sentence embeddings based on BERT encoding)
    4. Transformers (python library for working with transformer-based architectures like BERT)
    5. Access to a GPU or use Colab (preferably 16GB memory)
    6. Deep Learning Library (preferably Pytorch or TensorFlow)

Tasks

  • Go over the instructions to prepare your workspace
  • Watch the webinars in the resources to get familiar with the concepts we will need (at least the first four)
  • Play around with scraping until you are familiar with its concepts, use the DiscourseHub communities forums as a playground
  • Get familiar with what ML is and what it is a typical ML project workflow
  • Read about what NLP is, how it is used nowadays, and how we can use it to build our project
  • Train a basic ML model like logistic regression to classify a textual input into a negative or positive sentiment
  • Write your self-assessment

Tips

  1. Everyone has their own way of learning, some of us read books, others watch videos or play around with source codes. Find the best way you learn and practice until you feel comfortable enough to tackle the tasks to build the project.
  2. When encountering an error or have some kind of technical question, google it first then if you still didn’t get the answer you were searching for, ask in the forums politely and try to be as precise as you can.
  3. I am here to help so tag me if you need me for anything I will try to help as much as I can.
  4. Don’t worry if you don’t understand how to use some ml packages, or have no experience in deploying ML models, yet. We’ll be going more in-depth with these things in the upcoming modules.

Resources

STEMCasts

Industry Mentor Webinars: NLP Basics series

Other resources

Great NLP Resources.pdf (44.8 KB)

PS: Let me know if you need any clarifications or you don’t understand something.

7 Likes

Thank you Sara for the resources and your recommendation systems presentation. So is there an assignment that we have to do after studying the Resources or will next modules be having assignments?

1 Like

Hello @YasaminAbbaszadegan,

You are welcome. For now, let us get familiar with the needed technical concepts.
After you finish going through the technical requirements, you will first write what we call a self-assessment post about what you learned so far. After that, we will begin our first task which is scraping the data from the forum(s) of your choice which you will work on to build that recommender system.

Did I answer your question?

1 Like

Yes thank you

1 Like

hello @Sara_EL-ATEIF! We will be doing our self assessments in these comments itself right?

Hello @saloni,
Yes please comment your self assessment here and give it a meaningful title like : Self assessment-Full Name.

1 Like

Hi @Sara_EL-ATEIF, @saloni,

A small change for self-assessments. Please post in this channel: https://stemaway.com/c/pathway-hubs/self-assessments/178 and then post your topic URL as a reply to this topic.

Example: https://stemaway.com/t/debaleena-machine-learning/6982

This change is being introduced to allow for issuance of certificates to students who complete all modules in a hub and optional linking to a student’s 1-Click® Resume

Thank you @ddas, we did need a specific section to not overflow this one.

when should we do the self assesment? I havent finished watching and applying all the topics covered in the module yet

Take your time @YasaminAbbaszadegan.
I think before November 4th, 2020 would be good.
Let me know if you prefer another date @ddas.

Sounds good to me! If the majority need more time, we can push it out a little more as well.

Hi, I’ve post my self assessment on the thread.
Please let me know if I should post them elsewhere.
Watching those fruitful webinars is really a great beginning of this journal.
Thank you all in advance.
https://stemaway.com/t/debaleena-machine-learning/6982?u=angela_ku

Yes @Angela_Ku that is the correct place to share your self assessment.
Thank you and well done.

Great work @Angela_Ku and @YasaminAbbaszadegan

We will give a little more time to other students. If you wish, we can give you some additional tasks while you wait and/or a meeting to go over any questions you have at this time.

FYI: Have moved your assessments into your own topics.

I have written my self assesment. Here is the link
https://stemaway.com/t/debaleena-machine-learning/6982/4
The Resources are very informative. Kindly give some review.

Hi @Sara_EL-ATEIF, I did some rearrangement of the self-assessments. Sorry about the flux, we are all good to go now!

I have moved the self-assessments to individual topics. You can find the 3 self-assessments at:
https://stemaway.com/t/machine-learning-level-1-module-1-angela/6997
https://stemaway.com/t/machine-learning-level-1-module-1-yasamin/6996
https://stemaway.com/t/machine-learning-level-1-module-1-nash/7002

1 Like

No problem @ddas.
I will go over each of the self assessment I am just waiting for more people to fill in and then I will comment on them.
Thank you :nerd_face:.

Hello @Sara_EL-ATEIF, is there a deadline to submit the assessment ? i’ve just sign up in the pathway last day, and i ’ m late studying the resources for the first level 1.
Thank you,

Hello @OumaimaB,
Please do take the time to go over these resources and submit your self-assessment. I hope you can do so before the 21 November so that we can all stay on the same track.
Thank you.