Technical: I have had a good review of Python so far. I have learned how to use the BeautifulSoup and Pandas libraries.
Tools: I now have an understanding of Git and uploading files to GitHub. I have been using Visual Studio Code, for the first time, to edit my python files, debug, and upload to Git. I have also become familiar with Jupyter Notebook.
- Successfully cloning a GitHub repository and uploading a python file back onto to GitHub
- Accessing the article links that a web page leads to (basic web crawling)
- Accessing relevant data from the HTML of a web page (eg. Author, Title)
Week 1 Team Meeting (1.5 Hours)
Week 2 Team Meeting (1 Hour)
Week 3 Team Meeting (1 Hour)
Intermediate Python Training Session (1.5 Hours)
Git Webinar (3 Hours)
- Make a word cloud using the articles from community.smartthings.com as data analysis practice
- Implement a parts of speech tagging script on our practice data to help with text classification
- Use the bag of words method to find the inverse document frequency of each word in our practice data and calculate word vectors
This week, I performed web scraping and data cleaning on the community.smartthings.com site. One hurdle I faced was trying to access the rest of a web page with infinite scrolling. With some help from the project leads, I was given the URL format of each “page” to move through the infinite scrolling page.
Using our scripts, the team leads compiled about 3000 test cases to train our recommendation system later. I faced a hurdle with uploading my script to GitHub, as I have not used it before. However, with some internet research and persistence, I successfully uploaded a Python file and a Jupyter Notebook file.
Note: I would like to become a participant for this project.