Overview of things learned:
- Technical Area: Over the course of the 5 weeks, I had learned how to use Beautiful Soup to scrap data from multiple forums and store it into a file. I also learned how to pre-process the data that I collected. I learned about the BERT model and how the model works. I then learned how to use the transformers library to train a BERT model and use it for a classification task.
- Tools: We learned how to use GitHub, Asana, and Slack for communication and collaboration.
- Soft skills: I learned how to communicate with my teammate and how to help them with their difficulties. I also learned how to manage my time. I learned how to communicate with teammates and leads if I have any difficulties or questions.
Three achievements I had so far:
- Our team has successfully scraped more than 5000 posts and preprocess them from a discourse website.
- Understand how BERT models work and learn more about NLP.
- I successfully train a BERT model to classify posts into categories for a forum.
List of meetings I have joined so far:
- All the team meetings including all the general team meetings, web scraping workshop meetings, and data processing meetings.
- Git Webinar
- STEMCast Webinar
- Python Webinar
- NLP Webinar 3
- Q and A Webinar
Goals of the upcoming weeks:
- Deploy the model
- Web Scraping: Scrape data from 5000 posts using Selenium and store the data in a data frame.
- Data Cleaning: Clean data with different methods using the nltk module.
- Calculate TF-IDF: Calculate TF-IDF statistics using sklearn module.
- Work as a team for web scraping and data-preprocessing tasks.
- Learn about NLP and BERT with transformer library with the documentation.
- BERT: Train BERT Model to classify posts into categories in a forum.
Problems Faced and how I solved it:
- I had problems understanding how to use BERT in the transformer library. Reading the documentation for the library helped me understand a lot.
- My 1st BERT model’s accuracy is not very high. When I decrease the number of classes and the accuracy goes up. I think my model needs more fine-tuning.