Things learned
Technical area:
- Web Scraping using libraries like BeautifulSoup and Selenium
- Github and git commands
- Pre-processing the data
- Exploratory Data Analysis
- Overview of different embeddings and Machine Learning Models
Tools:
- Jupyter Notebook
- Google Colab
- Git
- Asana
Soft Skills:
- Machine Learning project workflow
- Learned how to work as a team and collaborate
Meetings attended:
- Initial Meetings: They were more of an introduction to the project and team members. We had to choose a forum and then develop a model that could classify the posts. We chose the discourse community: e-commerce.
- Web Scrapping Meeting: We got introduced to web scrapping and the concept of Site Map. Two teams - Team Flowster and Team amazon worked on respective forums.
- Advancements Meeting: We found issues and solved them together. For example, We found a completely non-English category in Amazon Forum. We decided to drop it.
We found that the forum uses javascript, and thus it is loaded dynamically. We had to solve the lazy loading problem to scrape all the data. - Pre-processing and Data Cleaning Meeting: We were introduced to pre-processing. Data is very important and losing it in pre-processing should be worth some good results. Thus, we understood the importance to study our data.
- GitHub Overview Meeting: I would thank Sarah for this. She explained the git commands, branches, pull requests, merges, and rebase to us. I learned how to use markdowns on GitHub and Jupyter notebook, it was pretty new to me and yet interesting.
- Fun Meeting: After some serious Machine Learning, we played Two Lies and One Truth. It was seriously amazing.
- EDA (Exploratory Data Analysis) and NLP (Natural Language Processing): We got to know different new things. The word cloud and visualization of the data gives you true insights.
Achievements:
- Web scraping the forums.
- Git commands and workflow.
- Pre-processing the data
Goals for the upcoming week
We are having everybody to give a short presentation on an embedding or model they worked upon. This way we have divided the work among us. So that in the end we can compare our results and choose the strategy or technique that gives us higher accuracy with our classifier.