PhillipChau - Machine Learning Pathway

Name: Phillip Chau
Team: ML-Team8-June1(Bertinator)

Technical Area:

I learned how to create a web crawler using the python framework Scrapy. I learned about the web extraction process, about the asynchronous nature of Scrapy and the Twisted Framework and how to scrape data from infinitely scrolling pages. I also learned about cleaning and manipulating data through the Pandas library and about version control software such as Git. As of now I am learning more about natural language processing through reading papers and watching videos abou the BERT model and about transformers.

Tools Used: Python, Pandas, BeautifulSoup, Git, Colab, Visual Studio Code, Bash

Soft Skills: Collaboration with teammates across the world, time management, effective communication and idea generation.

Achievement Highlights

  1. Successfully scraped the Huel forum on Discourse through Scrapy

  2. Successfully cleaned the dataframes through Pandas

  3. Learned more about version control and about Natural Language Processing, more specifically the BERT model

List of Meetings Attended

I have attended all meetings to date.

Goals for the Upcoming Week

My goal is to now finalize my comprehension of the BERT model and then implement it with the concatenated datasets in order to train the model to predict the appropriate forum based on a provided post.

Tasks Done

  • Scraped the data from the Huel Forum in the required format. This was difficult since I wasn’t familiar with the Scrapy framework and had to learn how to create a crawler.
  • Cleaned the data to remove HTML tags and then compiled multiple data frames using Pandas
  • Learned about BERT and began to understand how to implement the model.