Justin Ngo - Machine Learning Self Assessment

Week: Weeks 1-3, 7/20-8/10

Overview of Things Learned

  • Web Scraping using BeautifulSoup
  • Data Analysis
  • Data Pre-Processing

Technical Area

  • Determining a forum suitable for scraping
  • Web scraping data from a forum

Tools

  • BeautifulSoup
  • Asana

Soft Skills

  • Communication and working with a team using Slack

Achievement Highlights

  • Performing web scraping on the Codeacademy website, creating a dataset containing title, comments, categories, tags, posts
  • Worked together with the Codeacademy team
  • Analyzed data for pre-processing

Meetings Attended

  • Weekly ML Team 6 Meetings: 7/20, 7/23, 7/27, 7/29, 8/3, 8/5

Goals for the Upcoming Week

  • Finish data pre-processing/ cleaning on the Stack Exchange data
  • Gain a better understanding of nlp and machine learning techniques to be used on the dataset

Detailed Statement of Completed Tasks

  • Worked with a team to determine that the Codeacademy online forum provided suitable data to be used in training the recommender system
  • Created a plan for web scraping the website with team
  • Performed web scraping on the Codeacademy online forum title, comments, categories, tags, posts and stored data in a csv file

Week: Weeks 4-6, 8/10-8/31

Overview of Things Learned

  • Data Pre-processing
  • Machine Learning Models: BERT and Simple Transformers
  • Data Modeling
  • Analyzing Results

Technical Area

  • Data Pre-processing: tokenization, stopword removal, stemming, and lemmatization
  • Implementing BERT, Simple Transformers, and TF-IDF

Tools

  • Google Collaboratory
  • Jupyter Notebook
  • Microsoft Excel

Soft Skills

  • Learned new software and machine learning concepts at a fast pace
  • Problem solving
  • Communicating research and results to a team
  • Presenting technical research and results

Achievement Highlights

  • Implemented and worked with machine learning models
  • Researched methods of data modeling
  • Collaborated with a team to successfully conduct data modeling
  • Collaborated on a research presentation that displayed and analyzed results

Meetings Attended

  • Weekly ML Team 6 Meetings: 8/10, 8/12, 8/13, 8/14, 8/15, 8/17, 8/19, 8/21, 8/24, 8/26, 8/27, 8/28

Detailed Statement of Completed Tasks

  • Data Pre-processing: utilized tokenization, stopword removal, stemming, lemmatization, and removal of unwanted characters to clean data
  • Data Modeling: researched and implemented BERT, Simple Transformers, and TF-IDF models to build a recommender system
  • Presented and analyzed the final results of our finished NLP topic recommender system