Geoffrey2k - Machine Learning Pathway

Name: Pratik Geoffrey Saxena
Team: ML Team 8 or as we call ourselves Team Bertinator

Overview of Things Learned:

Technical Area: I learned a lot in these 3 weeks, apart from the non-technical skills I picked up, I’ve learned how to scrape websites using Scrapy. I’ve extensively explored the developer tools in Chrome, I’ve picked up a few skills on data cleansing, learned to work with Pandas. I have also explored various other libraries and especially new Deep Learning Libraries such as BERT and Distil BERT.

Tools Used: Python, Pandas, JSON, Scrapy, Git, Google Collaboratory, Jupiter Notebook, Visual Studio Code, LSTM’s, BERT.

Soft Skills: I’ve learned how to communicate professionally with people around the world, how to approach people for help and manage different time zones.

Achievement Highlights

  1. I’ve successfully scraped the Airline and Hopscotch forums with the help of Scrapy (which included 100000+ entries with 13 features per entry)
  2. Found a workaround for the infinite scrolling problem.
  3. Merged the datasets and cleaned the unwanted HTML code.
  4. Learned how to classify text with LSTM’s and applying the same for BERT.

List of Meetings Attended:

I’ve attended all the meetings except for the 1st meeting. [Wasn’t used to time zone changes.]

Goals for the Upcoming Week

Train my model using BERT and try to test it to see if I get the desired output.

Tasks Done

I’ve scraped the data from the Airline and the Hopscotch forum as instructed. This wasn’t an easy task as I was completely new to this, but my leads were very helpful and I could grasp the techniques fast.

I then cleaned the data and got rid of all the HTML scripts, I used PANDAS to help me with this and then I realized how fast it is.

I’ve merged the data frames uploaded by the team and I have trained my model using BERT.