Avk331 - Machine Learning Pathway

NAME: Ananya Veerendra Kumar (avk331)

TEAM: Machine Learning July Team 6

The things I learned during this internship:

Technical Areas:

  • Web Scraping
  • Web driving
  • Data pre-processing and augmentation
  • Natural Language Processing Algorithms
    • Bag of Words
    • TF-IDF Vectorization
    • BERT
    • DistilBERT
  • Machine Learning Classification Models
    • Decision Trees
    • Random Forest Classifier
    • k-NN
    • Logistic Regression

Tools Used:

  • Jupyter Notebook (Python)
  • Github
  • Google Colab
  • Slack
  • Asana (Project Management)


I learnt to perform webscraping on large datasets from the StackOverflow forum using BeautifulSoup and Selenimum libraries and extracted the text and the metadata from each of the posts. I performed the data cleaning steps like removing the stop words and implementing tokenization, lemmatization and stemming. I implemented a BERT model to build a general-purpose text feature extractor. Then used a k-NN classification model to find topics which are similar to the given query. Thus, developed a topic recommendation system which suggests similar topics.

Meetings Attended:
Attended all team meetings