Overview of Things Learned
Technical:
- Webscraping fundamentals
- Word embeddings for NLP applications
- ML models for NLP applications
Tools:
- Webscraping tools such as BeautifulSoup and Selenium
- Datalogging with Pandas and JSON
- scikit-learn ML library
- Google Colab with GitHub functionality
Soft Skills:
- Remote teamwork and communication
- Knowledge of the importance of personal profile sites such as Medium
Achievement Highlights
- Developed two modular webscrapers for the Amazon and Flowster Discourse forums
- Implemented the Naïve Bayes algorithm to classify Discourse topics into the correct categories
- Improved classifier accuracy to ~65% through experimentation of different data pre-processing techniques and word embeddings
List of Meetings/Training Attended
- All team meetings, work sessions and socials
Goals for the Coming Week
- Continue to investigate methods to improve Naïve Bayes classifier accuracy
- Investigate more complex word embeddings and experiment with logistic regression classifier models, with the possibility of expanding to a simple neural network
Detailed Statement of Tasks Done
- BeautifulSoup alone was not enough to properly scrape data from dynamic webpages. The solution was to use Selenium webdrivers in conjunction to properly load the pages’ HTML. Solved by myself with teammates, and with validation from our leads.
- The Naïve Bayes classifier proved to be fairly inaccurate upon the initial implementation. Received good feedback from leads on possible ways to pre-process training data to improve accuracy, as well and increasing the amount of available data through augmentation.
- Received good feedback from leads and the rest of the team in training the NB classifier on the Amazon data instead of the Flowster data, since it has more samples. The result was a more consistent accuracy.
Request Change of Role
I wouldn’t mind becoming a task lead if required. Just let me know of the responsibilities ahead of time