Overview of things learned:
- Technical Area: Was able to practice more with various cleaning techniques and pre-processing. Practiced more with pandas dataframes and csv files and applied them into our assignments. Through the BERT webinars learned the significance and application methods behind the BERT model. Through NLP webinars and team meetings, also learned significance of NLP models and how they are used through BERT.
- Tools: Kept communication channels open with my team though Slack, kept up with our tasks using Asana, and GitHub.
- Soft skills: Applied communication skills with sub-team to plan out our tasks to achieve our goals. Used time-management techniques to provide myself and team with adequate time to work on our respective tasks.
Three achievements I had so far:
- Our team has successfully scraped more than 5000 posts from a discourse website.
- Cleaned and pre-processed the data by removing tags and stopwords, tokenization, lemmatizations.
- Learned more about the BERT model, NLP, and how it may be applied to our project.
- Applied BERT to smaller sample of our overall collected data
List of meetings I have joined so far:
- Weekly team meetings including all the general team meetings, web scraping workshop meetings, and data processing meetings.
- BERT Webinar(s) by Industry Lead
- Git Webinars
- Intro to Python Webinar(s)
- STEMCast Webinar(s)
Goals of the upcoming week:
- Successfully use the Bert model to our set of data (5k+ posts).
- Web Scraping: Scraped data from forum collecting 5000+ posts using Selenium and store the data in a data frame and then converted to csv file.
- Data Cleaning: Clean data with different methods using the nltk module.
- Learned more about how BERT works by re-watching the webinars and online research
- Applied BERT model to smaller sample of data collected from our selected forum
- Communicated with team along each step of the way.
Problems Faced and how they were solved:
- Was still a bit unclear how BERT was to be implemented with our data, but after re-watching the webinars was given a better view on how to go about it.