Self Assessment till - 05/08/2020
Overview of Tasks and Skills achieved till now:
- How to use GitHub and Slack for team collaboration.
- Learned how to do Web Scraping using Requests and BeautifulSoup and Selenium.
- Tried to grasp, theoretically what BERT model is.
- Learned how to implement TD-IDF on a scraped CSV file.
Tools Used to implement above mentioned: - Selenium, Google Drive, Jupyter Notebook, GitHub, Slack, Beautiful Soup.
Soft skills I gained during these 3 weeks: - Teamwork and Collaboration with fellow mates and team leads, to understand to take suggestions and help wherever I got stuck.
- Read articles about BERT.
- Made a csv data file for all the scraped data from 4000 articles from the same forum.
- Implemented TF-IDF on that data from csv file.
7/21/20 - Team Meeting (W1)
7/28/20 - Team Meeting (W2)
7/31/20 - Office hours (for TF-IDF and BERT and related doubts.
8/4/20 - Team Meeting (W3)
- Look at the office hours and implement BERT.
- Try and see if I can work on understanding different variations of BERT.
- Ask my teammates and leads if I have any queries related to it.
The first week’s tasks included studying about BERT Word embedding, researching a little about the introduction and theoretically the models. Web Scraping was the main task for week1. I scraped the data on Webforum, which was chosen by our team. We were supposed to make a CSV file from the scraped data which we were able to. We faced a few issues, which recently got resolved with the help of our leads. I was facing the issue of extending the scraping to 4000 posts and generating a CSV file from it. In the second week, I read articles on BERT, and TFIDF and word embedding to get a deeper understanding of these. I chose TFIDF and implemented that on scraped data. I was successfully able to do it, with the help of my leads and teammates. I also learned, how important it is, to ask doubts, and put your queries up fast, that really saved a lot of my time.
For the upcoming week, I am supposed to understand more what BERT is, and try and implement it on the scraped data. If I have any spare time, I will also try to implement variations of it.