SHALINI_KUMARI - Machine Learning Pathway

Things learned
Technical area:

  • Web Scraping using libraries like BeautifulSoup and Selenium
  • Github and git commands
  • Pre-processing the data
  • Exploratory Data Analysis
  • Overview of different embeddings and Machine Learning Models

Tools:

  • Jupyter Notebook
  • Google Colab
  • Git
  • Asana

Soft Skills:

  • Machine Learning project workflow
  • Learned how to work as a team and collaborate

Meetings attended:

  • Initial Meetings: They were more of an introduction to the project and team members. We had to choose a forum and then develop a model that could classify the posts. We chose the discourse community: e-commerce.
  • Web Scrapping Meeting: We got introduced to web scrapping and the concept of Site Map. Two teams - Team Flowster and Team amazon worked on respective forums.
  • Advancements Meeting: We found issues and solved them together. For example, We found a completely non-English category in Amazon Forum. We decided to drop it.
    We found that the forum uses javascript, and thus it is loaded dynamically. We had to solve the lazy loading problem to scrape all the data.
  • Pre-processing and Data Cleaning Meeting: We were introduced to pre-processing. Data is very important and losing it in pre-processing should be worth some good results. Thus, we understood the importance to study our data.
  • GitHub Overview Meeting: I would thank Sarah for this. She explained the git commands, branches, pull requests, merges, and rebase to us. I learned how to use markdowns on GitHub and Jupyter notebook, it was pretty new to me and yet interesting.
  • Fun Meeting: After some serious Machine Learning, we played Two Lies and One Truth. It was seriously amazing.
  • EDA (Exploratory Data Analysis) and NLP (Natural Language Processing): We got to know different new things. The word cloud and visualization of the data gives you true insights.

Achievements:

  • Web scraping the forums.
  • Git commands and workflow.
  • Pre-processing the data

Goals for the upcoming week
We are having everybody to give a short presentation on an embedding or model they worked upon. This way we have divided the work among us. So that in the end we can compare our results and choose the strategy or technique that gives us higher accuracy with our classifier.

1 Like

Things learned

Technical area:

  • Word embeddings like Word2Vec(CBOW), TF-IDF, Count Vectoriser, etc.
  • Naive-Bayes Classifier

Tools:

  • Jupyter Notebook
  • Google Colab
  • Git
  • Asana

Soft Skills:

  • Giving a presentation to my team
  • Learned how to work as a team and collaborate

Meetings attended:

  • Daily Meetings: Everybody explained one embedding and this way we divided the work and explored every embedding technique.
  • Advancements: We looked for the technique giving us the highest accuracy.
  • Team Building Meeting: We played Two Truths and One Lie this time. We also played a drawing game. It was fun.

Achievements:

  • Understanding various embeddings
  • Presenting CBOW to my team

Goals for the upcoming week

We will try the advanced models of machine learning.

Things learned

Technical area:

  • Overview of Pytorch
  • Overview of BERT

Tools:

  • Jupyter Notebook
  • Google Colab
  • Git
  • Asana

Soft Skills:

  • Enriching my machine learning knowledge
  • Learned how to work as a team and collaborate

Meetings attended:

  • Presentation by Team Amazon and Flowster: Findings of the previous week were summarized in a presentation and the results were compared to find the best technique.
  • Data Merging: Data from both forums was merged. We agreed upon common pre-processing of data. We divided the tasks to try models like BERT, TF-IDF, etc.
  • Advancements Meeting: We discussed the advancements and problems that we encountered.
  • Team Building Meeting: We had a fruitful discussion about general things. We again played Two Truths and One Lie. It seems to be our favorite game to play.

Achievements:

  • Understanding various machine learning models
  • Diving deep in the technical algorithms.

Goals for the upcoming week

We will fine-tune our models to get good results.