Week: 7/27 – 8/01/2020
Overview of Things Learned:
Technical Area: Web scraping, Preprocessing
Tools: Requests, BeautifulSoup, re, and Pandas libraries; website
- Web scraping: used Python languages and the tools to build a web-crawling structure and extracted tags and key information, such as titles, authors, latest updated dates, and comments, from Mac Power Users forum.
- Preprocessing: Removed HTML tags, signs, and unnecessary information from scraped HTML and stored the cleaned data in CSV file.
7/23 – ML team kick-off meeting
7/27 – Team-4 meeting and introduction
7/29 – web scraping check-in
7/31 – web scraping check-in and reprocessing skills
Goals for the Upcoming Week
Exploring BERT library
Scraped and preprocessed data from the Mac-Power-User forum, stored them in CSV file, and uploaded the file in GitHub.