Self Assessment 3

Full Session Assessment:

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills:

Technical : Learned the whole process of scraping a webpage, organizing data into a dataframe, cleaning and processing the data, and feeding it into BERT and implementing classification algorithms to develop a recommender system

Tools :

  • Slack
  • GitHub
  • Asana
  • Python
  • Selenium WebDriver and Beautiful Soup
  • Pandas Library
  • DistilBERT

Soft Skills : Effective communication and collaboration with a team

Three achievement highlights:

  1. Successfully scraped the Fulfillment by Amazon category of the Amazon Seller Forums using Selenium and Beautiful Soup
  2. Organized data into a pandas dataframe and cleaned/processed it
  3. Implemented DistilBERT and a logistic regression classification algorithm to classify different posts by category, with an accuracy of about 87%

List of meetings/training attended including social team events:

Team Meetings: attended all team meetings and final project presentation

Training Sessions/Webinars: attended Git Webinar, watched recordings of other webinars (Python Training Webinar, NLP Webinar, etc.)

Detailed statement of tasks done. State each task, hurdles faced if any and how you solved the hurdle. You need to clearly mark whether the hurdles were solved with the help of training webinars, some help from project leads or significant help from project leads.

Task 1: GSuite, Slack, Asana, and GitHub setup

  • Task 1 Hurdles: We had some technical difficulties with setting up our GSuite accounts, but the team leads figured out the problem

Task 2: Scraped the Fulfillment by Amazon category of the Amazon Seller Forums using Selenium and Beautiful Soup

  • Task 2 Hurdles: Struggled a bit with extracting certain elements of the webpage, but figured it out through tutorial resources from team leads and by working through the code

Task 3: Cleaned and processed data after organizing it into a pandas dataframe

  • Task 3 Hurdles: Had no prior knowledge about NLP or the data processing we would have to complete before feeding our data into BERT, but tutorial resources from team leads helped me figure out what to do. Also, some of the posts were too long, so we had to truncate posts to a maximum length during tokenization.

Task 4: Implemented DistilBERT and a logistic regression classification algorithm to classify posts by category

  • Task 4 Hurdles: Again, had no prior knowledge about NLP, BERT, or the different classification algorithms we could use, but I learned using resources from team leads, webinars, and articles I found online