Technical Area:
- Familiarized myself with the stemAWAY website and forums.
- Learned the possible approaches for building a recommender system; content-based and collaborative approach.
- Learned the pros, cons and requirements for building a recommender system using the content-based approach (diversity, explainability, and relevancy).
- Explored the possible models for filtering data in this approach (classification and regression models)
- Learned about the typical workflow of handling data for ML (choosing a dataset, pre-processing, tabulation, creating word representations, and vectorizing the data).
- Built a web scraper for the quotes.toscrape site and learned how to use it for basic data mining tasks.
- Learned the basics of version control, Git, and GitHub.
Tools: VScode, GitHub
Soft Skills: time management, Problem solving (the additional resources section helped a lot in understanding how to operate Beautiful Soup and the complete NLP workflow, especially Alice Zhao’s YouTube lectures.)
Tasks Completed:
-
I prepared my workspaces. VScode was already set up for me, but I had very little experience with Jupyter notebooks and Colab. I installed the beautiful soup, Selenium, Spacy, Sentence_transformers, transformers, and PyTorch libraries.
-
I familiarized myself with the beautiful soup and requests libraries. I am new to Python, but my experience with JS helped me get a good enough grasp that I could follow along the Web Scraping intro. I also made my own web spider.
-
I attempted to follow along the workflow of preparing a dataset for an ML program, but I could only follow along till the pre-processing, and word-representations steps (as in, I could understand the video, but couldn’t replicate beyond that step).
-
I read about how BERT and a logistic regression model can together be used to evaluate positive/negative sentiments.