Overview of the things learned:
- Technical Area: Natural language processing.
- Learned how to do web-scraping and web-scrawling
- Saving data from Discourse forums
- Tools: Python, BeautifulSoup
- Soft skills: problem-solving
The three achievements for this week include:
- Implement web scraping and web crawling using Python
- Used BeautifulSoup to scrap data
- Collecting and organizing the data from discourse forums in a csv file
Goals for next week:
- Cleaning and preprocess the data retrieved
- Applying BERT model to the data
The first task was to experiment with web scraping, which I performed using BeautifulSoup. During this step, the webinar offered by Ms. Maleeha was extremely helpful as it got me acquainted with all the methods of web scraping. During the implementation, the issue faced was the inability of code to scroll through the entire webpage. For this step, a webinar by the Technical Lead Anubhav about use of Selenium was extremely helpful. The code implementation and data collection took several tries, a considerable size of data was retrieved, organized and saved in a csv file for later use. This data set was then cleaned to remove any repetitions and errors. Last week focused on learning about the methods of transforming the text in the data into a concise representation in-order to run the machine learning models (such as BERT) on it.