SindhuSamay - Machine Learning (Level 1) Pathway

Week 1

Things Learned:

1. Technical Area:

  • Machine Learning algorithms and NLP basics

  • Git and GitHub thorough setup and workflow using VSCode, GitHub, and Command-line

  • Web crawling, HTML structure, and data scraping introduction

  • Recommender System introduction and building procedure

2. Tools:

  • VSCode

  • GitHub

  • Trello

  • Scrapy & Beautiful Soup

  • Spacetime.am

  • STEM-Away

3. Soft Skills:

  • Learning from the mentors, communicating and planning with leads, and then transferring it to the team members while learning other things from them

  • Team management and engagement skills (the importance of icebreakers) from ML PM mentors

  • Leading some sections of the team meetings

Three Achievement Highlights:

  1. Created some important topics for our team in the forum and frequently posted updates in them.

  2. Created all the team meetings and added them to teammates’ calendars.

  3. Led the introductions and icebreakers in some of the team meetings.

Goals for Next Week:

  • Scrape the selected forum.

  • Add a checklist for all the cards on Trello to track the team members’ progress.

1 Like

Week 2

Things Learned:

1. Technical Area:

  • Basic Python

  • HTML classes, tags, and elements

  • How to scrape basics with beautiful soup and selenium

2. Tools:

  • VSCode

  • GitHub

  • Python, HTML, CSS, & JavaScript

  • Scrapy, Beautiful Soup, & Selenium

  • Trello

3. Soft Skills:

  • Team management and engagement skills from ML PM mentors

  • Leading some sections of the team meetings

Three Achievement Highlights:

  • Created scrum meetings for the team and added them to teammates’ calendars so they can hop in and chat with leads to ask questions or share their progress.

  • Led icebreakers in some of the team meetings.

  • Added the teammates and a checklist of their names to all the cards on Trello to track the their progress.

Tasks completed:

  • Scraped titles, number of likes, and categories in multiple different methods and tools/libraries.

  • Joined the team GitHub project.

Goals for Next Week:

  • Scrape the complete Tapas forum effectively

  • Start doing EDA with my data

  • Completely wrap up module 2 and start module 3

  • Start helping the teammates and start with task management planning

Week 3

Things Learned:

  • Technical Area:

    • Basic Python
    • HTML classes, tags, and elements
    • How to scrape basics with beautiful soup, Scrapy, and selenium
  • Tools:

    • VSCode
    • GitHub
    • Python, HTML, CSS, & JavaScript
    • Scrapy & Beautiful Soup
    • STEM-Away
  • Soft Skills:

    • Team management and engagement skills from ML PM mentors
    • Leading some sections of the team meetings

Three Achievement Highlights:

  • Tried to make inter team communication stronger by suggesting that every team member should join at least 2 scrum meetings with the leads per week, and encouraged team members to attend meetings with mentors
  • Added the new member to our Discord, Trello board, and meetings

Tasks completed:

  • Trying to scrape using Beautiful Soup and Scrapy
  • Leads split up leading scrum meeting days

Hurdles Facing:

  • Having trouble with scraping - reached out to a mentor

Goals for Next Week:

  • Finish scraping and EDA
  • Research for Module 3

Week 4

Things Learned:

  • Technical Area:

    • Scraping using Selenium, Beautiful Soup, and Python
    • Different libraries in Python
    • Machine Learning basics
    • Using Chrome and Firefox web drivers
    • Basic EDA
    • .csv and .json files basics
    • Repositories on GitHub
  • Tools:

    • VSCode
    • GitHub
    • Python, HTML, CSS, & JavaScript
    • Selenium & Beautiful Soup libraries
    • Excel
    • Pandas
    • Time and scrolling the webpage
    • Chrome and Firefox web drivers
  • Soft Skills:

    • Team management and engagement skills
    • Leading some sections of the team meetings
    • Communicating with teammates

Three Achievement Highlights:

  • Added the tasks for the week on Trello and added all members and checklist of members to each card.
  • Scraped Tapas!

Tasks completed:

  • Scraped the whole Tapas forum using Selenium, Beautiful Soup, Chrome web driver, and Firefox web driver. Scraped every thread title, no. of likes, no. of replies, no. of views, tags, category, last activity time, date created, leading post, and replies from each category. Got some help from my peer leads on Selenium syntax. Also, used resources provided by the mentor in Module 2 for scraping.
  • Converted data to .csv files.
  • Pushed files to GitHub.
  • Did some EDA.
  • Attended weekly meetings.

Goals for Next Week:

  • Work on Module 3.
  • Discuss with the leads regarding the transition into teamwork from individual work.

Week 5

Things Learned:

  • Technical Area:

    • Basic feature extraction using text data: no. of words, character, etc.
    • Basic Text Pre-processing of text data: lower casing; removal of punctuation, stopwords, etc.; tokenization; stemming; and lemmatization
    • Advance Text Processing: n-grams, TF-IDF, bag of words, sentiment analysis, word emmbedding
    • Making visual diagrams with the cleaned data
    • NLP basics, Word Embeddings, Vanilla Neural Networks, and Attention Model
  • Tools:

    • VSCode
    • GitHub
    • Python
    • Many libraries in Python
  • Soft Skills:

    • Team management and engagement skills
    • Leading some sections of the team meetings
    • Communicating with teammates

Three Achievement Highlights:

  • Led a section in our team meeting
  • Learned and did EDA on one of the categories I scraped
  • The leads decided on whose data the team should move forward with. Made two teams, one for classification and one of the recommender for the team, with the leads. Planned that we can make another team later for creating the app.

Tasks completed:

  • Completed all 4 detailed parts of processing textual data using the mentor’s tutorials for a category I scraped.
  • Pushed visuals and code to GitHub
  • Learned many things about NLP and ML
  • Went to a meeting with one of the mentors and got some clarification regarding the project

Goals for Next Week:

  • Complete Module 3
  • See how the individuals in the team are moving forward.

Week 6

Things Learned:

  • Technical Area:

    • Basics and usage of Colab
    • Building a content based recommendation engine
    • Classifying text using multiple classification algorithms: Naive Bayes, SVM, Logistic Regression, LightGBM, Decision Tree, BERT , etc.
    • Making some more visual diagrams with the cleaned data
    • Improving accuracies of classifiers
    • Confusion matrices and classification reports
  • Tools:

    • Google Colaboratory
    • GitHub
    • Python
    • Trello
    • VSCode
    • Many libraries: pandas, texblob, nltk, numpy, sklearn, gensim, requests, matplotlib, collections, seaborn, simpletransformers, tarfile, os, etc.
  • Soft Skills:

Three Achievement Highlights:

  • Built multiple classifiers and recommenders
  • Scraped urls
  • Started our final presentation slides

Tasks completed:

  • Completed the 4th part of processing textual data ( N-grams, TF-IDF, Bag of Words, Sentiment Analysis, and Word Embeddings) on Besart’s CSV file, as we chose to move on as a team with his file.
  • Went back and scraped all urls of all posts from every Tapas category and uploaded the updated CSV files to github so that it can be used to output clickable titles for our final recommendation system.
  • Built tested, and improved multiple classifiers and recommenders for the data: Naive Bayes, SVM, Logistic Regression, LightGBM, Decision Tree, and BERT classfiers; post title and original post based recommenders.
  • Started making the final presentation for our team.
  • Leads met and discussed about the final tasks, timeline, and asked to present next week on Wed.
  • Attended our weekly team meeting and updated Trello.

Goals for Next Week:

  • Put the best classifier in an app with the team
  • Try to work on building a recommender that specifically solves our problem statement
  • Finish, practice, and give our final presentation,

Week 7

Things Learned:

  • Technical Area:

    • A little bit about making an app for the recommender and the classifier
    • A little about making the recommender that satisfies our problem statement
    • Classifying text using multiple classification algorithms: The BERT family: BERT, RoBERTa, XLNet, XLM, and DistilBERT
    • Improving accuracies of classifiers
  • Tools:

    • Google Colaboratory
    • GitHub
    • Python
    • Trello
    • VSCode
    • Many libraries: pandas, nltk, sklearn, simpletransformers, tarfile, os, shutil, etc.
  • Soft Skills:

Three Achievement Highlights:

  • Created the template for our final team presentation and shared it with the team. Worked on some parts and successfully completed it together with the team.
  • Classified the data with some more classification algorithms

Tasks completed:

  • Built, tested, and improved multiple classifiers and recommenders for the data: BERT, RoBERTa, XLNet, XLM, and DistilBERT classifiers; a basic recommender that satisfies a part of our problem statement.
  • Finished the final presentation with the team.
  • Made some required team meetings for the last two days before our final presentation day to wrap up things.
  • We’re ready to present everything including an app with a recommender and a classifier as a team.

Goals for Next Week:

  • Upload my final work of module 3 in GitHub.
  • Rehearse before the presentation.
  • Give our final presentation.