Mb6496 - Machine Learning Pathway

mb6496 · June 17, 2020, 1:50am

Overview of the things learned:

Technical Area: Natural language processing.
- Learned how to do web-scraping and web-scrawling
- Saving data from Discourse forums
Tools: Python, BeautifulSoup
Soft skills: problem-solving

The three achievements for this week include:

Implement web scraping and web crawling using Python
Used BeautifulSoup to scrap data
Collecting and organizing the data from discourse forums in a csv file

Goals for next week:

Cleaning and preprocess the data retrieved
Applying BERT model to the data

The first task was to experiment with web scraping, which I performed using BeautifulSoup. During this step, the webinar offered by Ms. Maleeha was extremely helpful as it got me acquainted with all the methods of web scraping. During the implementation, the issue faced was the inability of code to scroll through the entire webpage. For this step, a webinar by the Technical Lead Anubhav about use of Selenium was extremely helpful. The code implementation and data collection took several tries, a considerable size of data was retrieved, organized and saved in a csv file for later use. This data set was then cleaned to remove any repetitions and errors. Last week focused on learning about the methods of transforming the text in the data into a concise representation in-order to run the machine learning models (such as BERT) on it.