Name: Phillip Chau
I learned how to create a web crawler using the python framework Scrapy. I learned about the web extraction process, about the asynchronous nature of Scrapy and the Twisted Framework and how to scrape data from infinitely scrolling pages. I also learned about cleaning and manipulating data through the Pandas library and about version control software such as Git. As of now I am learning more about natural language processing through reading papers and watching videos abou the BERT model and about transformers.
Tools Used: Python, Pandas, BeautifulSoup, Git, Colab, Visual Studio Code, Bash
Soft Skills: Collaboration with teammates across the world, time management, effective communication and idea generation.
Successfully scraped the Huel forum on Discourse through Scrapy
Successfully cleaned the dataframes through Pandas
Learned more about version control and about Natural Language Processing, more specifically the BERT model
List of Meetings Attended
I have attended all meetings to date.
Goals for the Upcoming Week
My goal is to now finalize my comprehension of the BERT model and then implement it with the concatenated datasets in order to train the model to predict the appropriate forum based on a provided post.
- Scraped the data from the Huel Forum in the required format. This was difficult since I wasn’t familiar with the Scrapy framework and had to learn how to create a crawler.
- Cleaned the data to remove HTML tags and then compiled multiple data frames using Pandas
- Learned about BERT and began to understand how to implement the model.