Name: Pratik Geoffrey Saxena
Team: ML Team 8 or as we call ourselves Team Bertinator
Overview of Things Learned:
Technical Area: I learned a lot in these 3 weeks, apart from the non-technical skills I picked up, I’ve learned how to scrape websites using Scrapy. I’ve extensively explored the developer tools in Chrome, I’ve picked up a few skills on data cleansing, learned to work with Pandas. I have also explored various other libraries and especially new Deep Learning Libraries such as BERT and Distil BERT.
Tools Used: Python, Pandas, JSON, Scrapy, Git, Google Collaboratory, Jupiter Notebook, Visual Studio Code, LSTM’s, BERT.
Soft Skills: I’ve learned how to communicate professionally with people around the world, how to approach people for help and manage different time zones.
Achievement Highlights
- I’ve successfully scraped the Airline and Hopscotch forums with the help of Scrapy (which included 100000+ entries with 13 features per entry)
- Found a workaround for the infinite scrolling problem.
- Merged the datasets and cleaned the unwanted HTML code.
- Learned how to classify text with LSTM’s and applying the same for BERT.
List of Meetings Attended:
I’ve attended all the meetings except for the 1st meeting. [Wasn’t used to time zone changes.]
Goals for the Upcoming Week
Train my model using BERT and try to test it to see if I get the desired output.
Tasks Done
I’ve scraped the data from the Airline and the Hopscotch forum as instructed. This wasn’t an easy task as I was completely new to this, but my leads were very helpful and I could grasp the techniques fast.
I then cleaned the data and got rid of all the HTML scripts, I used PANDAS to help me with this and then I realized how fast it is.
I’ve merged the data frames uploaded by the team and I have trained my model using BERT.