YasaswiS - Machine Learning (Level 1) Pathway

Week 1:

Overview of Things I Learned


Technical Area:

  • Saved data to CSV file using pandas
  • Reviewed different methods/modules for webscraping beside BeautifulSoup4

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • PyTorch (for Neural Networks)
  • BeautifulSoup, Selenium, Scrapy (Python web scraping modules)
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I improved my ability to use Git and GitHub, along with keeping track of solutions I tried while debugging an issue. I also found more resources to refer to while programming.


Achievements Completed:

  • Scraped multiple Wikipedia pages for data
  • Stored the scraped data as CSV and txt files to eliminate the need of data collecting multiple times
  • Implemented a basic text classifier in pytorch using online resources

Detailed statement of tasks completed: My biggest tasks were actually programming out the concepts I researched before and to build on the code I wrote before. I also made my code more organized in the repository for future reference. I also managed to program a small recurrent neural network using LSTM’s in pytorch, however I was having trouble training on the GPU due to CUDA version errors, so I had to resort to training on the CPU.

Overview of Things I Learned

Week 2:


Technical Area:

  • Saved data to CSV file using pandas
  • Reviewed different methods/modules for webscraping beside BeautifulSoup4

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • PyTorch (for Neural Networks)
  • BeautifulSoup, Selenium, Scrapy (Python web scraping modules)
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I improved my ability to use Git and GitHub, along with keeping track of solutions I tried while debugging an issue. I also found more resources to refer to while programming.


Achievements Completed:

  • Scraped multiple forums pages for data
  • Stored the scraped data as CSV and txt files to eliminate the need of data collecting multiple times

Detailed statement of tasks completed: My biggest tasks were actually programming out the concepts I researched before and to build on the code I wrote before. I also made my code more organized in the repository for future reference. I also had to deal with balancing data across different classes when there was a data imbalance.

Overview of Things I Learned

Week 3:


Technical Area:

  • Implemented multiple types of recommender models (ex: Collaborative Filtering based recommender systems)

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • Tensorflow/Keras (for Neural Network)
  • NumPy (for recommender system implementation)
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I found more useful resources than blog posts, such as video tutorial that explain the math and reasoning behind certain ML concepts, like recommender systems. I also learned better code formatting practices to write more readable code that makes collaborating in teams easier.


Achievements Completed:

  • Implemented multiple recommender system on the data scraped from forums
  • Analyzed recommendations from models to rank them

Detailed statement of tasks completed: I researched the different types of models that were commonly used for recommendations and came across this article that explained the different types of recommender systems, how they work, and the pro’s and con’s of each one. I then collaborated with my team on coming up with different ways to evaluate the recommender model’s performance.

Overview of Things I Learned

Week 4:


Technical Area:

  • Implemented multiple types of recommender models (ex: Collaborative Filtering based recommender systems)

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • NumPy (for recommender system implementation)
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I developed my skill in understanding what I want my models to produce, so that I can determine what metrics to use to measure the success of the model and I can more accurately evaluate them.


Achievements Completed:

  • Implemented multiple recommender system on the data scraped from forums
  • Analyzed recommendations from models to rank them

Detailed statement of tasks completed: I worked on implementing the different models I researched from the previous week and evaluating them based on metrics that I came up with while collaborating and researching.

Overview of Things I Learned

Week 5:


Technical Area:

  • Implemented multiple classifier models
  • Evaluated models based on multiple metrics

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • Tensorflow/Keras (for Neural Network)
  • NumPy
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I was able to practice finding the resources that I needed, and applying them to the specific problem I was facing. I also improved at describing the kinds of problems I was facing.


Achievements Completed:

  • Balanced out the data among classes from
  • Implemented a text based classifiers
  • Used metrics like Accuracy, F1 Score, etc to evaluate performance of model.

Detailed statement of tasks completed: First I tried to balance the number of posts I had in each class so I’d have a more even distribution of data, and then I implemented a classifier based on the data that I collected from webscraping forums and evaluated it based on metrics beyond just accuracy, which is misleading in situations with class imbalances.

Overview of Things I Learned

Week 6:


Technical Area:

  • Started analyzing where model was having issues and came up with possible solutions

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • Tensorflow/Keras (for Neural Network)
  • NumPy
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I was able to use my coding environment (Google Colab) more efficiently to rapidly prototype multiple models to see if my changes resulted in an improvement to the model’s performance.


Achievements Completed:

  • Retrained classifier
  • Tried different model architectures
  • Used metrics like Accuracy, F1 Score, etc to evaluate performance of models

Detailed statement of tasks completed: I retrained the classifier from before, and made changes to parameters like the amount of text that is given, the title, the model’s architecture, along with new techniques to reduce the data imbalance in the dataset.

Overview of Things I Learned

Week 7:


Technical Area:

  • Analyzed the dataset and came up with visuals to show patterns in the dataset
  • Look at the preprocessing that occurred before training the classifier to check if it was helpful or could be improved

Tools:

  • Jupyter Notebook/Lab (Programming Environment)
  • Matplotlib (for Visualization)
  • NumPy
  • Pandas (Data management)
  • Git/GitHub (To organize code)

Soft Skills:

I was able to consider the overall pipeline that the machine learning model would end up in to consider if certain preprocessing techniques were useful or not.


Achievements Completed:

  • Modify preprocessing steps
  • Create embeddings of the words in the dataset
  • Visualize the relationship between the word embeddings using t-sne

Detailed statement of tasks completed: I used the word2vec model to generate embeddings of all the words that occurred more than 150 times in all the title’s of posts I scraped, and then used t-sne to create a plot where words that were more related were closer together and words that were further apart were more unrelated to show trends in the dataset that the classifier model learned.