Kifena - Machine Learning Pathway

Things learned:
Technical: Refreshed data visualization skills, learned how to do Web Scraping using BeautifulSoup, as well as Requests and Selenium methods for getting data from multiple pages, understood and implemented TF-IDF, and word embeddings. Gained experience inspecting web pages, searching for the needed tags and information.

Tools: GitHub, GoogleCollab.

Achievements:

  • Collected data from 4000+ posts, 50+ webpages and made a csv dataset with 7 features

  • Implemented TF-IDF on the data

  • Visualized the findings using matplotlib

Meetings attended:
Week 1 Team meetings, 2 hours (x2)
Week 2 Team meeting, 1 hour
Office Hour, 1
Week 3 Team meeting, 1 hour

Goals

  1. To implement word embeddings on the data set and plot the embeddings in a way reflecting their corresponding words’ meaning.
  2. Implement BERT.
  3. Get started with variations of BERT.

Tasks completed
This week I’ve been working on improving the quality of the dataset I out together last week, researched BERT, word embeddings and TF-IDF methods, and implemented the latter on the dataset. The biggest challenges were finding the appropriate format for the visualization of the results, as well as performing web scraping of additional few thousands posts. With the help of the coding demos, office hours and patience, I managed to overcome these obstacles. I have also learned to take initiative and meet extremely close deadlines.

1 Like

Things learned:
Technical: learned to scrape data from multiple categories, change the orientation of datasets, implement BERT.

Tools: GitHub, JupiterNotebook, GoogleCollab.

Achievements:

  • Collected data from 4200+ posts, 50+ webpages, 9 categories, and made a csv dataset with 7 features.

Meetings attended:
Week 4, 2 meetings

Goals

  1. To implement word embeddings on the data set and plot the embeddings in a way reflecting their corresponding words’ meaning.
  2. Implement BERT on a bigger dataset.
  3. Get started with variations of BERT.

Tasks completed
This week I’ve been working on the dataset and adding multiple features to it, as well as understanding and implementing BERT.

Things learned:
Technical: theory behind BERT, and GPUs.

Tools: GitHub, JupiterNotebook, GoogleCollab, AWS.

Achievements:

  • implemented BERT on the big dataset, started working on DistillBERT, and learning how to use Amazon Web Services.

Meetings attended:
Week 5, Friday meeting
Goals

  1. Implement DistillBERT.
  2. Upload the model to AWS, and create a question-answering API.

Tasks completed
Understanding and implementing BERT, DistillBERT. Watching and reading tutorials about AWS.