THINGS LEARNED:
Technical:
I learned how to use Scrapy and BeautifulSoup for web scraping. I learned how to use various libraries related to text preprocessing and concepts related to text preprocessing such as tokenization and lemmatization.
Tools: I have a better understanding of webscraping and textpreprocessing as well as an understanding of Git and uploading files to GitHub.
Soft skills: Collaborating with others through git hub using version control.
ACHIEVEMENTS:
- Performed web scraping using BeautifulSoup to crawl through multiple pages and access data from HTML info.
- Learned about version control using Git, set up and contributed to a repository as well as learned and utilized various git commands.
- Utilized scraped data in conjunction with Pandas to compile and preprocess data.
MEETINGS ATTENDED:
Week 1 Team Meeting
Week 2 Team Meeting
Week 3 Team Meeting
Web Scraping Webinar
Git Webinar
GOALS:
- Perform exploratory data analysis of complete dataset, scraped from smartthings community on discourse.
- Continue with data preprocessing and begin text classification by perfroming parts of speech tagging using lexical based methods and deep learning methods.
- Implement term frequency to be used later for recommendation system.
TASKS COMPLETED:
For week 1 I was able to perform web scraping using BeautifulSoup. Xianbo was very helpful to explain how to circumvent issues with the infinite scrolling bar. Additionally data were then compiled into a single dataframe using Pandas, and preprocessing was completed including the removal of stop words, tokenization, and lemmatization.