Sahmad11 - Machine Learning Pathway

Milestones accomplished on the Machine Learning summer project.

Technical Gains: I have learnt a lot about natural language processing and deep learning model. I have implemented both TF-IDF and BERT embeddings to train a deep neural network to solve the multilabel classification problem. I also gained experience with web scraping libraries in Python to scrape data from forums to build datasets for model training.

Tools: Keras, TensorFlow, BeautifulSoup, scikit-learn, matplotlib, Google Colab, Github.

Soft-Skills: Professional collaboration by working with the team members to integrate individual code files for building the final recommender system. I also learned to set realistic weekly targets to achieve in order to make consistent progress throughout the internship tenure.

Tasks Done

  • Vectorized text data using BERT and TF-IDF
    I followed two separate approaches using both BERT embedding and TF-IDF to transform text data from StackExchange forum for my deep learning model training. TF-IDF delivered a higher level of accuracies and I pursued that as my final approach.

  • Implemented the deep learning model using TensorFlow
    Using the TF-IDF imbeddings, I designed and trained a deep learning model on the text data from StackExchange forum posts. I was able to achieve a categorical accuracy of approximately 70% on test data for tags prediction.

  • Implemented an active learning function for the model
    Added this feature to retrain the model continuously on the text content of new posts on the STEM-Away forum.