Overview of Things Learned:
Technical Area: Web Scraping, Data Cleaning
Tools: Scrapy, Requests, Pandas, Beautiful Soup
Used Beautiful Soup and then Scrapy to scrape data from Community CarTalk forum from over 13,000 posts.
Familiarised myself with Collaboration tools such as Jupyter and Google Collab
Pre-processed my scraped data to clean it for applying further machine learning algorithms
Debugged for hours on end by checking out tens of sources for the errors of my code. Finally comfortable with Data scraping.
- Introduction to Web Scraping
- Web Scraping Check-in
- Web Scraping and Preprocessing presentations
Goals for the Upcoming Week
- Refine my processed data
- Learn about TF-IDF and BERT
- Collaborate on ideas and techniques with fellow team members
- Web Scraping: Scraped data from Community CarTalk forum. Had some issues with using XPath expressions, but I resolved it later by using the JSON module which made the tag-fetching much easier and convenient. Pushed these CSV files to the team repository on Github.
- Pre-processing: Cleaning my scraped data using pandas and other libraries like re. Still refining my data.