Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills:
Technical: I have learned how to scrape a website for data using the Selenium library in Python.
Tools: I have learned how to collaborate using Slack. I am also in the process of learning how to further utilize GitHub.
Soft Skills: I have learned how to effectively communicate technical ideas within my sub-team.
Three achievement highlights:
-
Successfully scraped a discourse forum, and organized the data efficiently in a pandas DataFrame.
-
Collaborated with my team and sub-teams to debug all our code, and maximize runtime efficiency.
-
Cleaned the data, stored it in csv files, and uploaded the csv files along with the corresponding code to my sub-team branch on GitHub.
List of meetings/ training attended including social team event:
All team meetings: 6/1, 6/2, 6/9, 6/13
STEMcasts: Overview of ML and project, Data Mining, Recommendation Models, Git
Training: First Python Training Session
Goals for the upcoming week. Next self-assessment will be due on the following Tuesday 06/23
My goals for the next week are to learn how to use BERT and learn how to use the data we collected to train a model.
Detailed statement of tasks done. State each task, hurdles faced if any, and how you solved the hurdle. You need to mark whether the hurdles were solved with the help of training webinars, some help from project leads or significant help from project leads:
Task 1 Completed: Signed up for Slack, Asana, GSuite. Became familiar with the project goals. Formed sub-teams.
Hurdles: My Gsuite account was not set up. The leads got it set up for me and I was able to login by the next day. The leads split up the whole team into skill-balanced sub-teams.
Task 2 Completed: Wrote code to extract data from the Amazon Seller Discourse forum. The “Account Health” category. We created two DataFrames, one containing the title, category, sub-category, original post content, and the URL. The other contains the URL, and all responses for each post.
Hurdles: I had no prior experience webscraping so I had to teach myself the core concepts. To do this, I re-watched the Data Mining STEMcast and followed along. Jenny, the technical lead, also did a tutorial at one of our meetings where she showed us a scraping example using Selenium. From that I was able to figure out how to use Selenium to scrape data for our project.
Task 3 Completed: Cleaned the data removing any ‘\n’ characters, and any non-English characters.
Task 4 Completed: Stored the clean data in two separate csv files, one for each DataFrame, and uploaded the csv files, and corresponding code to the sub-team’s GitHub branch. Collaborated on code with my teammates through GitHub.
Hurdle: I personally have never used GitHub before, so I was REALLY confused on how to use it. My teammate helped out by creating our branch, and starting by uploading his version of the code. From there, I was able to make necessary changes, and upload an updated version. To get more familiar with GitHub, I re-watched a few clips from the Git STEMcast.
Task 4 Completed: Started keeping a running log to track project progress within my sub-team.
Request change of role if it applies. You may request to become a task lead. Or switch between participant and observer roles.
If my team leads need more support, I would be glad to move up from participant to task lead.