Milestones accomplished on the Machine Learning summer project.
Technical Gains: I have learnt a lot about natural language processing and deep learning model. I have implemented both TF-IDF and BERT embeddings to train a deep neural network to solve the multilabel classification problem. I also gained experience with web scraping libraries in Python to scrape data from forums to build datasets for model training.
Tools: Keras, TensorFlow, BeautifulSoup, scikit-learn, matplotlib, Google Colab, Github.
Soft-Skills: Professional collaboration by working with the team members to integrate individual code files for building the final recommender system. I also learned to set realistic weekly targets to achieve in order to make consistent progress throughout the internship tenure.
Vectorized text data using BERT and TF-IDF
I followed two separate approaches using both BERT embedding and TF-IDF to transform text data from StackExchange forum for my deep learning model training. TF-IDF delivered a higher level of accuracies and I pursued that as my final approach.
Implemented the deep learning model using TensorFlow
Using the TF-IDF imbeddings, I designed and trained a deep learning model on the text data from StackExchange forum posts. I was able to achieve a categorical accuracy of approximately 70% on test data for tags prediction.
Implemented an active learning function for the model
Added this feature to retrain the model continuously on the text content of new posts on the STEM-Away forum.