Keerthikamath - Machine Learning Pathway

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills:

Technical : Gained more experience programming in Python; learned how to use Selenium and Beautiful Soup for webscraping; learned how to create a dataframe using pandas

Tools : Learned how to use Asana and learned more about Github

Soft Skills : Team collaboration

Three achievement highlights:

  1. Successfully scraped the Fulfillment by Amazon forum using Selenium and Beautiful Soup
  2. Was able to organize all data into a dataframe using pandas
  3. Communicated and collaborated well with teammates to make progress on the project

List of meetings/ training attended including social team events

Attended all team meetings on Zoom and the Git Webinar (watched recordings of other webinars)

Goals for the upcoming week

Learn how to use BERT and develop a ML training model

Detailed statement of tasks done

Task 1: Set up GSuite, Slack, Asana, and GitHub

  • Task 1 Hurdles: Unable to setup GSuite and GitHub at first, but the team leaders helped fix the problem

Task 2: Scraped the Fulfillment by Amazon forum using Python, Beautiful Soup, and Selenium

  • Task 2 Hurdles: Had little experience in Python and no experience in webscraping prior to the internship, so I learned through online tutorials, videos, and examples from the team leaders. Also struggled a bit with extracting certain elements from the webpage, but eventually figured it out.

Task 3: Organized data into a pandas dataframe and saved it as a CSV file

  • Task 3 Hurdles: No issues with this task

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills:

Technical: Learned more about NLP and how BERT works
Tools: DistilBERT
Soft Skills: Teamwork and effective communication

Three achievement highlights:

  1. Collected and modified our data
  2. Learned about NLP and BERT for the first time
  3. Learning how to use DistilBERT to train a model

List of meetings/ training attended including social team events:

Meetings: Attended all team meetings on Zoom, watching NLP webinar recordings

Goals for the upcoming week:

Use DistilBERT to process and classify data

Detailed statement of tasks done. State each task, hurdles faced if any and how you solved the hurdle. You need to clearly mark whether the hurdles were solved with the help of training webinars, some help from project leads or significant help from project leads.

  1. Task: Learning about BERT and beginning to implement it
    Hurdles: Had no prior knowledge about BERT, but I’m beginning to figure it out through articles, videos, webinars and resources from project leads

Full Session Assessment:

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills:

Technical : Learned the whole process of scraping a webpage, organizing data into a dataframe, cleaning and processing the data, and feeding it into BERT and implementing classification algorithms to develop a recommender system

Tools :

  • Slack
  • GitHub
  • Asana
  • Python
  • Selenium WebDriver and Beautiful Soup
  • Pandas Library
  • DistilBERT

Soft Skills : Effective communication and collaboration with a team

Three achievement highlights:

  1. Successfully scraped the Fulfillment by Amazon category of the Amazon Seller Forums using Selenium and Beautiful Soup
  2. Organized data into a pandas dataframe and cleaned/processed it
  3. Implemented DistilBERT and a logistic regression classification algorithm to classify different posts by category, with an accuracy of about 87%

List of meetings/training attended including social team events:

Team Meetings: attended all team meetings and final project presentation

Training Sessions/Webinars: attended Git Webinar, watched recordings of other webinars (Python Training Webinar, NLP Webinar, etc.)

Detailed statement of tasks done. State each task, hurdles faced if any and how you solved the hurdle. You need to clearly mark whether the hurdles were solved with the help of training webinars, some help from project leads or significant help from project leads.

Task 1: GSuite, Slack, Asana, and GitHub setup

  • Task 1 Hurdles: We had some technical difficulties with setting up our GSuite accounts, but the team leads figured out the problem

Task 2: Scraped the Fulfillment by Amazon category of the Amazon Seller Forums using Selenium and Beautiful Soup

  • Task 2 Hurdles: Struggled a bit with extracting certain elements of the webpage, but figured it out through tutorial resources from team leads and by working through the code

Task 3: Cleaned and processed data after organizing it into a pandas dataframe

  • Task 3 Hurdles: Had no prior knowledge about NLP or the data processing we would have to complete before feeding our data into BERT, but tutorial resources from team leads helped me figure out what to do. Also, some of the posts were too long, so we had to truncate posts to a maximum length during tokenization.

Task 4: Implemented DistilBERT and a logistic regression classification algorithm to classify posts by category

  • Task 4 Hurdles: Again, had no prior knowledge about NLP, BERT, or the different classification algorithms we could use, but I learned using resources from team leads, webinars, and articles I found online