Gursimran Kaur - Machine Learning Pathway - Self Assessment

Self Assessment till - 05/08/2020

Overview of Tasks and Skills achieved till now:

Technical Areas:

  • How to use GitHub and Slack for team collaboration.
  • Learned how to do Web Scraping using Requests and BeautifulSoup and Selenium.
  • Tried to grasp, theoretically what BERT model is.
  • Learned how to implement TD-IDF on a scraped CSV file.

Tools Used to implement above mentioned: - Selenium, Google Drive, Jupyter Notebook, GitHub, Slack, Beautiful Soup.

Soft skills I gained during these 3 weeks: - Teamwork and Collaboration with fellow mates and team leads, to understand to take suggestions and help wherever I got stuck.

Achievements:

  • Read articles about BERT.
  • Made a csv data file for all the scraped data from 4000 articles from the same forum.
  • Implemented TF-IDF on that data from csv file.

Meetings attended:
7/21/20 - Team Meeting (W1)
7/28/20 - Team Meeting (W2)
7/31/20 - Office hours (for TF-IDF and BERT and related doubts.
8/4/20 - Team Meeting (W3)

Goals:

  • Look at the office hours and implement BERT.
  • Try and see if I can work on understanding different variations of BERT.
  • Ask my teammates and leads if I have any queries related to it.

Tasks completed:
The first week’s tasks included studying about BERT Word embedding, researching a little about the introduction and theoretically the models. Web Scraping was the main task for week1. I scraped the data on Webforum, which was chosen by our team. We were supposed to make a CSV file from the scraped data which we were able to. We faced a few issues, which recently got resolved with the help of our leads. I was facing the issue of extending the scraping to 4000 posts and generating a CSV file from it. In the second week, I read articles on BERT, and TFIDF and word embedding to get a deeper understanding of these. I chose TFIDF and implemented that on scraped data. I was successfully able to do it, with the help of my leads and teammates. I also learned, how important it is, to ask doubts, and put your queries up fast, that really saved a lot of my time.
For the upcoming week, I am supposed to understand more what BERT is, and try and implement it on the scraped data. If I have any spare time, I will also try to implement variations of it.

1 Like

Self Assessment till - 11/08/2020

Overview of Tasks and Skills achieved till now:

Technical Areas:

  • We were taught to scrape data more efficiently using different libraries, and pre-processing, to make the data scraped in the earlier weeks better for further implementation of BERT.
  • Research more about Basic BERT and it’s models.
  • Research about GPUs.
  • Implement basic BERT, taught during the tutorial and check accuracy of that BERT model (DistilBERT).

Tools Used to implement above mentioned: - GitHub, Google Drive, Jupyter Notebook, GoogleCollab, Slack.

Soft skills I gained during these 3 weeks: - Teamwork and Collaboration with fellow mates and team leads. I got to learn how to present your scientific data and facts in a visually appealing way.

Achievements:

  • Read articles about BERT Variations and how this State of the art language model of NLP works.
  • Studied about variations of BERT and how they work.
  • Presented the final results in a ppt.

Meetings attended:
Week 4, 2 hours 2 meetings.
Week 5, 1 hour 1 meeting. (Presentation of final Bert Model results)

Goals:

  • Apply DistilBERT on scraped data to find accuracy, on a bigger dataset.
  • Study contrasting differences in RoBERTa and DistilBERT.
  • Look at the office hours and try implementing RobBERTa (which have stark differences in performance).
  • Learn about AWS services.

Tasks completed:
The past weeks I’ve been working on the dataset scraped in the initial weeks of the project and adding different features to it, as well as understanding and implementing BERT more. I tried implementing DistilBERT according to what our technical lead showed us. I had a few issues handling the GPUs and everything. We faced a few issues, which recently got resolved with the help of our leads. I was able to successfully show and see the accuracy of the scraped model using DistilBERT. We were also instructed to start watching tutorials and do an account set up on AWS, to create an answering API for the Bert model.