Technical Area:
- Got a great introduction to scrapy and bs4
- Learned a lot about the process of collecting and cleaning data to fit into certain formats and transformers
- Learned about different NLP algorithms and ways to implement into current project alongside our practice with forums
Tools:
- VSCode
- VIM (small coding edits)
- Google Colab
- Anaconda (mainly for virtual env)
Soft skills:
- Enjoyed debugging my code and understanding more about reading up on documentation, in particular with beautifulsoup
- Learned more about the web scraping element and spiders/web crawlers
Achievements:
- Scrape data from forums and understanding bs4 library and the usage of the .name, .get and .find_all() keywords
- Able to store and clean data into a df and able to store into any kind of file, not necessarily always needing to be a database (exported into .txt and .JSON file)
Tasks:
- Chose my discourse forum to be the Forum Anime Network (AN)
- Was able to scrape the data by using BeautifulSoup and Requests library
- Worked on cleaning data and also worked on some Exploratory Data Analysis (EDA)