I am the Task Lead for ML Team 5. It was a good experience getting to know the people of team 5 and leading them. I enhanced my hands-on experience in machine learning algorithms and applied them to a real-world problem.
I learned about the basics of NLP, about BERT and Tf-IDF, and also web scraping using a web crawler. I built a recommender system using BERT to recommend the top 5 posts and also a forum classifier to classify a post using TF-IDF.
I learned a lot about team management, effective team communication.
Highlights from the Internship:
- Created a web crawler to scrape pages from a discourse forum called Home assist.
- Developed a BERT recommendation system that returned the top 5 posts that were similar to a selected post.
- Developed a Machine learning text classifier to predict which forum a forum submission belongs to.
Held team meeting twice a day and also attended webinars by industry mentor on the use of git and to get an in-depth understanding of Natural Language Processing.
Even though I was a lead I still learned a lot. I gained a deeper understanding of different Natural Language Processing terms like BERT, Tf-IDF (term frequency-inverse document frequency), tokenization, etc. I learned how to web scrape data by building a web crawler in python using the beautiful soup library to get the link and the post of a specific webpage and saving it in a pickle file. I built a post recommendation by creating BERT embedding and using cosine similarity to get the nearest similar posts. I also lead a sub-group of 3 students to create a forum classifier using TF-IDF to predict which forum a post is from. Compared different machine learning algorithms (Random Forest Classifier, Multinomial Naive Bayes, and Linear SVC). We got an accuracy of 97% approximately using the Linear SVC ML algorithm, 93% accuracy with Multinomial Naive Bayes algorithm, and 91% accuracy with RandomForest Classifier.