Xiaofang - Machine Learning (Level 1) Pathway

Technical Areas:

  • Familiarized myself with fundamentals of EDA
  • Learned how to navigate HTML and extract certain components of the website
  • Got more practice with scraping - extract text and dates
  • Basic Clean the Data and Analyze the data

Tools:

  • Jupyter Notebook
  • BeautifulSoup, Spacy, Pandas, NumPy, CSV, Requests
  • Git / GitHub

Soft Skills:

  • Navigating forums like stack overflow to find solutions to some errors
  • Demonstrating Code with team members

Tasks Completed:

  • Scraped the messages from one of the boards into a CSV and removed the HTML tags
  • Basic data analysis - compared the relative lengths of the comments and graphed them (frequency of long/medium/short messages)
  • Basic data visualization

Module 1

Concise overview of things learned: Technical area: • Set up the Google Colab Environment

• Set up the New local test Environment

• Learned Basics of Machine learning

• Learned Definition of Classification

• Learned the basic of Logistic regression model

• Learned Basics of NLP(by reading PDF)

• Learned Web Scrapping(Selenium, Chrome driver, and beautiful soup)

Tools: Beautiful Soap, Selenium, Chrome driver, Github, Jupyter Notebook, Python

Softskills: Learned about Project Management

Achievement highlights: Successfully set up a new test environment for the project Successfully use the tools selenium and Beautiful Soup to get data from the website.https://www.discoursehub.com/

Detailed Statement of Tasks Completed: 1 Use selenium and chrome driver to auto click the website 2 Locate to the section of the e-commerce 3 Use beautiful soup to collect data from the page. The data are the topic of the latest review of the e-commerce section.

Module 3

Technical

  • World Cloud
  • Logistic regression model
  • Random Forest Model
  • Xgboost Model
  • Recommender

Tools jupyter notebook ,word2Vec

Achievement

understanding the new python package word2vec

Successfully build three models

Challenge

The Accuracy of three models are not so good

Module 4

Technical

  • Bert classification Model
  • Simple Transformers Library
  • Compare the various model results such as Bert and Random Forest

Tools

  • Simple Transformer
  • Torch
  • Sklearn
  • Tokenizers

Achievement

Successful built Bert Classification Model

Challenge

Did not build a ML app

Module 1

Concise overview of things learned:

Technical area: • Set up the Google Colab Environment

• Set up the New local test Environment

• Learned Basics of Machine learning

• Learned Definition of Classification

• Learned the basic of Logistic regression model

• Learned Basics of NLP(by reading PDF)

• Learned Web Scrapping(Selenium, Chrome driver, and beautiful soup)

Tools: Beautiful Soap, Selenium, Chrome driver, Github, Jupyter Notebook, Python

Softskills: Learned about Project Management

Achievement highlights: Successfully set up a new test environment for the project Successfully use the tools selenium and Beautiful Soup to get data from the website.https://www.discoursehub.com/

Detailed Statement of Tasks Completed: 1 Use selenium and chrome driver to auto click the website 2 Locate to the section of the e-commerce 3 Use beautiful soup to collect data from the page. The data are the topic of the latest review of the e-commerce section.