Samarth_Shah - Machine Learning Pathway

Samarth_Shah · August 5, 2020, 5:51pm

Concise Overview of Things Learned

Technical Area: I learned how to use Beautiful Soup and its various methods to scrape the web for text and information I was assigned to acquire. I also learned how to use Selenium to automate the clicking and navigating across the internet. Together, these made a great combination and unlocked the ability for me to acquire a lot of information on the Code Academy forum, ultimately allowing me to create a dataset of posts, comments, etc.
Tools: Some of the tools I became familiar with include slack, a team management tool, which allows users to report and message teammates on their status on a certain project. I also got familiar with Asana to organize all of my tasks onto one well-designed dashboard.
Soft Skills: Before the internship, talking to strangers was a nerve-racking experience. However, now I am more than comfortable with talking to my teammates.

Three Achievements/Highlights

-Successfully scraped Code Academy forum and made a dataset for posts, comments, etc.
-Improved communication with teammates
-Gained interest and read over 30 articles on Natural Language Processing

List of Meetings/training attended including social team events

Team Meetings: 7/20, 7/23, 7/27, 8/3, 8/5
Webinars: [OH] ML by Pavitra - 7/7, [OH] ML by Akanksha - 7/8

Goals for The Upcoming Week

Further my knowledge on web scraping/automation
Learn NLP/ML techniques to process the dataset
Attend more team meetings and webinars

Detailed Statement of Completed Tasks

-Setting up the Stem-Away platform, Slack, Github, GSuite
-Hurdles faced: Had trouble getting into a team in the beginning, but contacted team leads to sort
out all of the issues

-Learning Beautiful Soup and Selenium
-Hurdles faced: The most common problem I encountered was getting ‘NoneType’ errors when
trying to extract information from websites. To solve this issue, I got into the habit of printing out
every modified variable.

-Outputting the data obtained from web scraping into a CSV format
-Hurdles Faced: The only issue with this was formatting the dataset properly, but after looking
through resources, I was able to figure the issue out