As part of task 1, we were required to explore multiple forums similar to the STEM-Away that comprised topics/categories/posts/tags pertinent to STEM. The forum that I worked on was Discourse Meta. Thereafter, we had to present our observations in the form of a report. It was followed by scraping the StackOverflow forum as it was found to more relevant. After generating the csv, we went through the data preprocessing stage. We’re now exploring the BERT model.
Overview of things learned -
- Technical -
- Web scraping using BeautifulSoup
- Data mining
- Data cleaning
- Tools used -
- Asana (for project management)
- Jupyter (for python coding)
Soft skills -
Interacted with people coming from diverse backgrounds and level of expertise. Collaborated with teams and learnt new things from their work.
Achievement highlights -
- Got familiar with Asana
- Honed web scraping and data preprocessing skills
- Connected with the leads and team-mates and made new friends
- Meetings attended -
- Attended a group meeting
- Attended all team meetings
- Tasks done -
- Prepared a report along with my group, stating the pros and cons of using Discourse Meta for web scraping.
- Scraped the StackOverflow forum for one tag, namely, data science, that had around 5.5k posts.
- Presented a data analysis report of the csv obtained from scraping 13 categories of StackOverflow. The report consisted of the anomalies that needed to be addressed as part of the data preprocessing stage.
- Cleaned the dataset.
- Goals for the upcoming week -
Implementing the BERT model