Afink - Machine Learning Pathway

THINGS LEARNED:
Technical:
I learned how to use Scrapy and BeautifulSoup for web scraping. I learned how to use various libraries related to text preprocessing and concepts related to text preprocessing such as tokenization and lemmatization.

Tools: I have a better understanding of webscraping and textpreprocessing as well as an understanding of Git and uploading files to GitHub.

Soft skills: Collaborating with others through git hub using version control.

ACHIEVEMENTS:

  1. Performed web scraping using BeautifulSoup to crawl through multiple pages and access data from HTML info.
  2. Learned about version control using Git, set up and contributed to a repository as well as learned and utilized various git commands.
  3. Utilized scraped data in conjunction with Pandas to compile and preprocess data.

MEETINGS ATTENDED:

Week 1 Team Meeting
Week 2 Team Meeting
Week 3 Team Meeting
Web Scraping Webinar
Git Webinar

GOALS:

  1. Perform exploratory data analysis of complete dataset, scraped from smartthings community on discourse.
  2. Continue with data preprocessing and begin text classification by perfroming parts of speech tagging using lexical based methods and deep learning methods.
  3. Implement term frequency to be used later for recommendation system.

TASKS COMPLETED:

For week 1 I was able to perform web scraping using BeautifulSoup. Xianbo was very helpful to explain how to circumvent issues with the infinite scrolling bar. Additionally data were then compiled into a single dataframe using Pandas, and preprocessing was completed including the removal of stop words, tokenization, and lemmatization.