Self Assessment 8/4
Technical Area:
- Web Scraping using BeautifulSoup and Selenium
- Implemented TF-IDF on scraped data
- Understood web embeddings and BERT
Tools:
- Jupyter Notebook
- Github
- GSuites
- Slack
Soft skills:
- Communicated during meetings to ask leads questions
- Communicated with support group to organize meetings for presentations and to ask questions
Three achievements:
- Implemented web scraping on student forum (collected data on 4000+ posts)
- Implemented TF-IDF on the scraped data
- Understood word embeddings and BERT
List of meetings/trainings attended:
7/21 Team Meeting, 7/28 Team Meeting, 7/31 Team Meeting, 8/4 Team Meeting, attended support group meetings as well
Tasks Completed:
- Web scraping on student forum
- Applied TF-IDF to the scraped data
- Visualized TF-IDF results using matplotlib
Goals for upcoming week:
- Try to implement word embeddings (I did not get a chance to do this by myself)
- Implementation of BERT
- Ask support group more questions when I am confused
Detailed Statement of Tasks Done:
The first week we worked on web scraping, specifically scraping articles from a student forum. We were successfully able to scrape 4000+ posts using code explained during the first team meeting and modifying it. Additionally during the first week, we presented on a high-level overview of BERT to understand the underlying mechanisms that are used. The second week I worked on implementing TF-IDF on the data that was scraped. I did this using the code from office hours as well as other online resources and modifying it to accommodate the large amount of data we were analyzing. From these results, I was able to create a histogram and bar plot of the data using matplotlib. These results were shared during a presentation in today’s team meeting.