Weeks 1 + 2
Concise Overview of Things Learned:
Technical Area:
- Count Vectorizer
- TF-IDF
- Introduction to LSTM and RNN Models
Tools:
- sklearn API
- Pytorch
Soft Skills:
- Presentation over Digital Medium
Three Achievement Highlights:
- Use the CountVectorizer and Logistic Regression on the Amazon dataset.
- Build TF-IDF Logistic Regression model used on the Amazon dataset which yielded an accuracy level of 74%
- Present to my team about the Count Vectorizer model and present Team Amazon findings with all the models run.
Meetings Attended:
- Updates on Webscraping 6/8
- Data Cleaning + Last MInute Webscraping Problems 6/9
- Review and Refine our Data for ML 6/12
- Embedding Techniques + ML Algorithms 6/16
- Embedding Techniques + ML Algorithms 6/18
- Week 5 Planning + Team Building 6/19
- Presentation of Week 3 Teams Work 6/22
- Discuss Data Merging and Bert vs TF-IDF 6/24
Tasks Completed:
- I was able to learn about the count vectorizer word embedding and make some initial models to see how it functions.
- I then created a presentation and ran my team through what I learned, most notably that count vectorizer is a lite version of TF-IDF word embedding which should generally be used over count vectorizer.
- Some hurdles I faced this week were that I had a lot of work from my courses in school as it was the end of one of my summer classes. Initially, I was meant to present my word embedding on Monday 6/15 but thankfully, Rohit and Sara allowed me to present on 6/16 instead.
- I also was able to research and learn more about LSTM and BERT models which function differently from the previous models that we’ve discussed. I look forward to working with this through this next portion of the project.
Goals for Next Week:
- Work on the TF-IDF and Logistic Regression model on the combined dataset from the Flowster and Amazon teams.
- Also play around with BERT models to learn the difference between this method and previous.