Modules 1 and 2
Learned about web scraping and web crawling, and got familiar with libraries to obtain data. Used Scrapy to build web spiders. Learned to reformat data based on my needs from HTML code.
Trello, Scrapy, Beautiful Soup, Selenium
Joined Trello to get myself on track with the tasks and my teammates, prepared a Github repo for further work.
Set up my environment for data mining using web scraping and module tutorials (They were super helpful!) and prepared my Github environment with Sara’s repo.
I would like to get more comfortable completing web scraping tasks by practicing the technical skills I learned during the first two modules.
Revisited web scraping skills and observed how to use collected data within different classification models. Loaded data to CSV files for later analysis.
Colab, Beautiful Soup, Selenium, Numpy
Staying in contact with team members and communicating with team leads for changes in schedule.
I was able to load data to csv files. I learned about classification models for later use.
Cleaning data, looking into ML terms and developing further understanding of ML.
I would need to work on categorizing my findings to better feed the ML algorithm and the recommender.
Understood the principles of pretrained models such as BERT. Used the code provided to test its capabilities through simple transformers.
Pandas, Numpy, Jupyter Notebook, Selenium
Watching tutorials to better comprehend classification models, practicing our final presentation!
I successfully built some classification models using logistic regression and decision trees, and advanced these using BERT.
Comparing different methods of classification systems, testing with BERT model. Decreased the amount of data imbalance. Achieved a 85% accuracy at most!
Use cross validation to improve the accuracy of data collected.
Look into Flask to turn this into a web app.