What I learned:
-Utilizing Python for Data Science
- Soft Skills
Three Achievement Highlights:
- Web Scrape on my own for the first time
- Learn how to use Jupyter Notebook
- Created a word cloud after data cleaning
List of Training/Meetings Attended:
-Team’s Monday Meetings
Goals for the upcoming week:
-Communicate and work well with my team on the upcoming text analysis task.
-Learn more about text analysis techniques to apply to my data set.
Detailed statement of tasks done:
Task 1: Web scraping
I used BeautifulSoup to web scrape from a SmartThings topic post using Juptyer Notebook and I was able to retrieve the username and content of each post. The only hurdle I faced was that I used Chrome Driver to web scrape and was told to change my code by one of my project leads after I completed data cleaning and EDA despite following an example notebook that was provided which used Chrome Driver. I was able to solve the problem with a bit of advice from the same project lead, but I primarily solved it on my own.
Task 2: Data Cleaning
I removed the html, stop words, and punctuation while maintaining any urls found in the content of a post by researching a regular expression that had the abilities I was looking for. I then applied a lemmatizer as well as a porter stemmer and compared the effects on the data I collected. I decided to use the porter stemmer since it had better results, but I consulted a lead to make sure. The only hurdle I had was my misconception of data cleaning, but after talking to the project leads on Slack or going to office hours, I understood what I had to do.
Task 3: EDA
I created bar graphs to show the most frequent user on the topic post as well as the most frequent words written. I also created a sentiment graph, but I didn’t think that the graph made much sense, so I decided to not showcase it. Finally, since I didn’t want to have two frequency bar graphs, I made a word cloud instead for the most frequent words written due to a suggestion made by one of the project leads. The only hurdle I had was my lack of knowledge on EDA, but I was given resources to learn by my project leads.
I would also like to upgrade from an observer to a participant.