Milestones accomplished on the Machine Learning summer project.
Technical Gains: I have learnt a lot about natural language processing and deep learning model. I have implemented both TF-IDF and BERT embeddings to train a deep neural network to solve the multilabel classification problem. I also gained experience with web scraping libraries in Python to scrape data from forums to build datasets for model training.
Tools: Keras, TensorFlow, BeautifulSoup, scikit-learn, matplotlib, Google Colab, Github.
Soft-Skills: Professional collaboration by working with the team members to integrate individual code files for building the final recommender system. I also learned to set realistic weekly targets to achieve in order to make consistent progress throughout the internship tenure.
Tasks Done
-
Vectorized text data using BERT and TF-IDF
I followed two separate approaches using both BERT embedding and TF-IDF to transform text data from StackExchange forum for my deep learning model training. TF-IDF delivered a higher level of accuracies and I pursued that as my final approach. -
Implemented the deep learning model using TensorFlow
Using the TF-IDF imbeddings, I designed and trained a deep learning model on the text data from StackExchange forum posts. I was able to achieve a categorical accuracy of approximately 70% on test data for tags prediction. -
Implemented an active learning function for the model
Added this feature to retrain the model continuously on the text content of new posts on the STEM-Away forum.