Hrithik Malhotra - Machine Learning Pathway - Self Assessment

Week: 7/27

Overview of Things Learned:

  • Technical Area: Web Scraping, Data Cleaning

  • Tools: Scrapy, Requests, Pandas, Beautiful Soup

  • Soft Skills: #communication #teamwork
    #internationalcollaboration
    Achievement Highlights

  • Used Beautiful Soup and then Scrapy to scrape data from Community CarTalk forum from over 13,000 posts.

  • Familiarised myself with Collaboration tools such as Jupyter and Google Collab

  • Pre-processed my scraped data to clean it for applying further machine learning algorithms

  • Debugged for hours on end by checking out tens of sources for the errors of my code. Finally comfortable with Data scraping.

Meetings attended

  • Introduction to Web Scraping
  • Web Scraping Check-in
  • Web Scraping and Preprocessing presentations

Goals for the Upcoming Week

  • Refine my processed data
  • Learn about TF-IDF and BERT
  • Collaborate on ideas and techniques with fellow team members

Tasks Done

  • Web Scraping: Scraped data from Community CarTalk forum. Had some issues with using XPath expressions, but I resolved it later by using the JSON module which made the tag-fetching much easier and convenient. Pushed these CSV files to the team repository on Github.
  • Pre-processing: Cleaning my scraped data using pandas and other libraries like re. Still refining my data.

Week: 8/10

Overview of Things Learned:

Meetings attended

  • Pre-processing Check-in
  • TF-IDF Check-in
  • BERT Presentation

Goals for the Upcoming Week

  • Complete the BERT implementation.
  • Collaborate on ideas and techniques with fellow team members

Tasks Done

  • Successfully made the TF, IDF and the TF-IDF data frames for all the reviews in my pre-processed data from the Car Talk Community forum.