Concise Overview of Things Learned
Technical Area: I learned how to use Beautiful Soup and its various methods to scrape the web for text and information I was assigned to acquire. I also learned how to use Selenium to automate the clicking and navigating across the internet. Together, these made a great combination and unlocked the ability for me to acquire a lot of information on the Code Academy forum, ultimately allowing me to create a dataset of posts, comments, etc.
Tools: Some of the tools I became familiar with include slack, a team management tool, which allows users to report and message teammates on their status on a certain project. I also got familiar with Asana to organize all of my tasks onto one well-designed dashboard.
Soft Skills: Before the internship, talking to strangers was a nerve-racking experience. However, now I am more than comfortable with talking to my teammates.
-Successfully scraped Code Academy forum and made a dataset for posts, comments, etc.
-Improved communication with teammates
-Gained interest and read over 30 articles on Natural Language Processing
List of Meetings/training attended including social team events
- Team Meetings: 7/20, 7/23, 7/27, 8/3, 8/5
- Webinars: [OH] ML by Pavitra - 7/7, [OH] ML by Akanksha - 7/8
Goals for The Upcoming Week
- Further my knowledge on web scraping/automation
- Learn NLP/ML techniques to process the dataset
- Attend more team meetings and webinars
Detailed Statement of Completed Tasks
-Setting up the Stem-Away platform, Slack, Github, GSuite
-Hurdles faced: Had trouble getting into a team in the beginning, but contacted team leads to sort
out all of the issues
-Learning Beautiful Soup and Selenium
-Hurdles faced: The most common problem I encountered was getting ‘NoneType’ errors when
trying to extract information from websites. To solve this issue, I got into the habit of printing out
every modified variable.
-Outputting the data obtained from web scraping into a CSV format
-Hurdles Faced: The only issue with this was formatting the dataset properly, but after looking
through resources, I was able to figure the issue out