Machine Learning - Level 1 Module 1 - Angela

Concise overview of things learned

  • Technical Area :
    • Colab introduction: Set the Colab using GPU (runtime->change runtime type to GPU.
    • Mac environment setting introduction
    • Machine learning introduction:
      • Machine learning can be applied to a lot of areas including recommendation systems, game, personalized medicine, financial markets, self-driving and voice assistants
      • To build a program that learn, we learn patterns between data and output, which is so-called inference framework, as opposed to traditional way just simply memorize the data and the output. Inference framework can be predictive and can help the model deduce new facts from old facts.
      • Basic Outline: define objectiveàdata gathering àpreparing data (data cleaning)àdata exploration àbuilding a model à model evaluationàprediction
      • Three types of ML :
        • supervised learning: labeled data (hard to required).
        • unsupervised learning: unlabeled data (The machine learns the natural of cluster of the data between dataset.
        • reinforcement learning: learn a reward-based system.
      • Recommendation system: Ex: Netflix (based on viewing history), Amazon(based on things you viewed or bought in the past), Spotify( based on your past hearing preference), Instagram-Explore Page (based on your past liked post hashtag),LinkedIn(showing jobs you are interested in)
      • Why Recommendation system? The long tail-niche products can exist because there is no limitation on the physical display as opposed to the traditional type of retail industry. This can assist the users to discover things they want but hard to find by themselves.
      • Recommendation system approaches: (Netflix as an example)
        • Collaborative filtering: similar user (need to know the user preference first)
        • Content-based filtering: similar movie (inherit attributes of the movie such as title, actors, genre, box office revenue, keywords, summary…). Uses item features to recommend other items similar to what user likes, based on their previous actions or explicit feedback.
          • Measures of similarity:
            • Cosine : The closer to one the more similar
            • Dot product : The higher the dot product, the higher similarity.
            • Euclidean distance : The smaller the better
          • Pros:
            • no need data about other users à easier to scale.
            • The model can capture the specific interests of a userà Can recommend niche item that very few other users are interested
          • Cons:
            • Feature representation of the items are hand-engineered to some extentà requires a lot of domain knowledge à the model can only be as good as the hand-engineered features.
            • The model can only make recommendations based on existing interests of the user à limited ability to expand on the users’ existing interests.
      • Content based recommender: unlike revenue, there will be a number you can see as an input, how could we build a model with content such as keywords and summary? NLP is the answer. k-nearest neighbors: find the vectors nearby and recommend the nearby vector because this means they are similar.
      • Word Embeddings: distributed representations of text in an n-dimensional space.
      • How to create a recommender system for Discourse?
        • Task1: web scraping: scrape the discourse pages; collecting multiple attributes (text, author,tags, date posted, replies, category);organizing data.
        • Task2: build different models for finding the most similar posts for a given post
        • Task 3: ground truth, how do we evaluate our models? Labeling ground truthàevaluate performance of models
      • Two approaches
        • Metrics based evaluation
        • Human based evaluation: A/B testing approach
    • Web scraping/crawling
      • Be ethical! Check the robot.txt on the website to see if there are any restrictions and practices with the website which is welcoming the crawling, for example, http://quotes.toscrape.com.
      • URL stands for universal resource locator
  • Tools: scrapy, git, trello
  • Soft Skills: ethical web crawling, project management

Achievement highlights

Watched the first four webinars as our mentor suggested.
Played with the web crawler using the colab environment.
Downloaded git and found other resources to learn how to use it correctly.

Detailed statement of tasks completed

Familiarize myself with the basic concept of machine learning, version control, and project management.

Familiarize myself with the recommendation system project frame.

This is the first time I write the self-assessment and if someone can give me some feedback, I would be so grateful.

Hi Angela, Great work on the self-assessment!

A self-assessment does the following:

  • Helps you keep track of your learning and plan for next steps
  • Shows us that you are passionate about the field. Helps secure your spot in the next internship session. You may also qualify for a certificate of completion (if you do all the levels in a module)
  • Showcases your achievements

Your write-up achieves the first 2 goals perfectly! In terms of showcasing, there is some room for improvement. Our goal at STEM-Away is to help you learn and also showcase yourself. Sharing some tips, hope they help!

Almost all companies have some form of self-assessment. This gets reviewed by your peers and managers and paves your way forward in your career. In any self-assessment (and presentation), it is important to first get the attention of your audience with a concise summary and then go into details.

In your case, you can move most of the details from “Technical Area” to “Detailed Statements…”. Just keep a summary there.

And here is a quick example of how some minimal changes can project your achievements more powerfully. You have to find your own tone when you project yourself, sharing this example just for guidance!

Achievement Highlights

Watched the first four webinars as our mentor suggested.
Played with the web crawler using the colab environment.
Downloaded git and found other resources to learn how to use it correctly.

Achievement Highlights - edited

  • Developed a good understanding of the overall project by watching all four webinars suggested by our mentor.
  • Experimented with the web crawler program using the colab environment.
  • Successfully downloaded git and found additional resources to learn how to use it correctly.
1 Like

Hello @Angela_Ku,

Debaleena has already provided a good overview of how you can improve your self-assessment. One thing I would like to add is to make it more result-driven and less of a detailed summary of what you learned.

For next, please comment your GitHub username in its appropriate post and check out Module 2 details.

Thank you.

Thank you for your advice! I will keep the summary succinct and not to the detailed next time!

Thank you very much for your instruction!
I totally misunderstand the showcase part at the beginning. Thanks for your advise and the example of the achievement highlights. I’ll try my best next time to make the third goal fulfilled.
Thanks again for your fruitful advice!

Machine Learning - Level 1 Module 2 - Angela
Concise overview of things learned

  • Technical Area : beautiful soup, git push,
  • Tools : python, git,Github
  • Soft Skills : Always check if the git is the most up-to-date version.

Achievement highlights

  • Developed a good understanding of the web crawler.
  • Experimented with beautiful soup with HTML structure using the colab environment.
  • Successfully upgraded git in my local machine.

Detailed statement of tasks completed

  • Built a semi-web crawler of Amazon seller service forum.
  • Clone the git repo from the github in my local machine.
  • Successfully upgraded git in my local machine.

To be continued

  • Build the complete web crawler, need to learn more about HTML and how to parse the text only.
  • Learn how to write the data crawled from the website to CSV file.
  • Learn how to push the repo to the Github, it seems like there are some login issues needed to be resolved.