Jdutta - Machine Learning (Level 1) Pathway

Technical Area:

  • Prepared working environment
  • Gained a basic understanding of machine learning and NLP
  • Learned the basics of Python
  • Learned the basics of webscraping and crawling, using Beautiful Soup and Scrapy

Tools:

  • Python
  • Anaconda (Jupyter Notebooks)
  • Google Colab (I will be using this in the future, instead of running on my local environment)
  • Beautiful Soup
  • Scrapy
  • Jira

Soft Skills:

  • Communicating with teammates
  • Keeping team updated with Jira

Achievements:

  • Learning and using Python
  • Learning the basics of machine learning
1 Like

Self-Assessment for Module 2

Technical Area:

  • Reviewed Python, followed Module 2 tutorials
  • Explored the PyTorch forum website
  • Learned how to extract data from a website
  • Scraped data from PyTorch forum (storing attributes in a CSV file)
  • Performed basic data cleaning and EDA

Tools:

  • Selenium/Firefox Webdriver (geckodriver)
  • Visual Studio Code
  • pandas, beautifulsoup4
  • Python
  • HTML
  • Github
  • Jira

Soft Skills:

  • Communication with team on Discord
  • Attended team meetings and office hours

Achievements:

  • Learned how to inspect elements of webpages
  • Learned how data can be extracted from a website
  • Learned about storing data attributes into a CSV file
  • Was able to scrape thousands of posts from the PyTorch Forums (and am currently in the process of cleaning and analyzing the data)

Self-Assessment for Module 3

Technical Area:

  • Followed module 3 tutorials
  • Performed basic data cleaning and EDA on the data
  • Trained basic ML models (Naive Bayes Classifier, Linear Support Vector Machine, Logistic Regression, and Decision Tree), analyzing their accuracy
  • Trained ensemble ML models (LogisticRegression, RandomForest, and XGBoost)

Tools:

  • Jupyter Notebook
  • Python
  • pandas, matplotlib, numpy
  • Github
  • Jira

Soft Skills:

  • Communication with team on Discord
  • Attended team meetings

Achievements:

  • Was able to perform data cleaning and some basic visualizations of the data scraped from the PyTorch models
  • Learned the basics of pandas Dataframes
  • Learned how to train basic and ensemble ML models

Self-Assessment for Module 4

Technical Area:

  • Followed module 4 tutorials
  • Watched videos from module 1 about NLP concepts
  • Learned how to train BERT, RoBERTa, DistilBERT, and XLNet models using the Simple Transformers library
  • Learned how to combine an advanced model (BERT) with a simple model (Logistic Regression)
  • Learned the concepts of building and dockerizing a web application

Tools:

  • Jupyter Notebooks (local)
  • Google Colab (for running advanced models on a GPU)
  • Python
  • Python libraries: pandas, sklearn, Simple Transformers, Tokenizers, Re, tarfile
  • Discord
  • STEM-Away Platform

Soft Skills:

  • Delivered two presentations (including final presentation)
  • Communication with team on Discord
  • Attended team meetings

Achievements:

  • Learned how to train advanced ML models
  • Learned how to build and dockerize a web application