Priyanka Shah - Machine Learning Self- assessment

Week 1:

Brief overview of tools and tasks which I have learned in the first week of ML internship.

Tools: I learned how to use slack channel, GitHub and Asana. I learned pros and cons of Google Colab for code development.

List of meetings: I have attended two meetings in the first week on 7/20 and 7/23. In the first meeting, we discussed tasks for a week and a second meeting for follow up.

Tasks: I followed resources from the July team about the recommendation system and got basic knowledge of NLP.

Created an account in Slack, Asana, GSuite.

Understood the project goals and introduction.

Form sub-teams within team 6.

Made a report to find similarities between Codechef and Stem-Away forum. According to the comparison, Codechef forum was not suitable for the project.

Hurdles:

My username on the STEM-Away website was the numbers, so team members couldn’t see my name in the hangout meeting.

Hurdles Solution:

Leads helped me to change my username on STEM-Away portal.

Week 2:

Brief overview of soft skills and tasks which I have learned in the second week of ML internship.

Soft Skills: I learned how to scrape the data from websites using three methods such as Selenium, BeautifulSoup and Scrapy. I found the BeautifulSoup method easy to understand and scrape the data.

List of meetings: I have attended two meetings in the first week on 7/27 and 7/29. In the first meeting, we discussed tasks for a week and a second meeting for follow up.

Tasks: I followed resources from the July team about the BeautifulSoup method and also Leads provided a few resources to understand each method of web scraping.

Created python script to extract data from “Codeacademy” forum. Every member of the sub team extracted the data from different sub categories and combined all the data in a single csv file. Modeled Data Frame with the following columns “Title”, “Categories”, “Tags”, “Number of replies”, “Post” and “Replies”.

Hurdles:

I was new to web scraping. So, I had used Youtube videos and a towards data science website on web scraping.

I was able to scrape the data except posts.

Hurdles solution:

Sub team members communicated with each other using hangout meetings and understood how to get data of posts.

Week 3:

Brief overview of soft skills and tasks which I have learned in the third week of ML internship.

Soft Skills: I learned how to clean the data for NLP. Team 6 finalized to work on “Stack Exchange” csv file because it has a lot of data as compared to Codecademy forum.

List of meetings: I have attended two meetings in the first week on 08/03 and 08/05. In the first meeting, we discussed tasks for a week and a second meeting for follow up.

Tasks: I followed resources from the July team and also Leads provided a few resources related to data cleaning.

Understood StackExchange csv file and remove irrelevant data like “Number of Answers” and “Date_posted” columns. Additionally, remove punctuations and \n from the data.

Hurdles:

I was trying to replace NaN values with “Not Available” and delete the “Number of Answers” column. In this case, the column was removed but NaN values were still in the data.

I was using Google Colab for coding and I had to reconnect again and again.

Hurdles solution:

Communicate with the Lead and they explained each step of data cleaning. Also, provided a script to connect Google Colab for a long time. It was very helpful.

Week 4:

Brief overview of soft skills and tasks which I have learned in the fourth week of ML internship.

Soft Skills: I learned different methods of modelling.

List of meetings: I have attended two meetings in the first week on 08/10, 08/13, 08/14, 08/14, 08/15. In the first meeting, we discussed tasks for a week and after that three sub teams present presentations on modelling.

Tasks: I followed resources from the July team and also Leads provided a few resources related to data modelling.

I learned tf-idf, simple transformer and BERT methods. Additionally, I have participated in making presentations on BERT.