NAME: Ananya Veerendra Kumar (avk331)
TEAM: Machine Learning July Team 6
The things I learned during this internship:
Technical Areas:
- Web Scraping
- Web driving
- Data pre-processing and augmentation
- Natural Language Processing Algorithms
- Bag of Words
- TF-IDF Vectorization
- DistilBERT
- Machine Learning Classification Models
- Decision Trees
- Random Forest Classifier
- k-NN
- Logistic Regression
Tools Used:
- Jupyter Notebook (Python)
- Github
- Google Colab
- Slack
- Asana (Project Management)
I learnt to perform webscraping on large datasets from the StackOverflow forum using BeautifulSoup and Selenimum libraries and extracted the text and the metadata from each of the posts. I performed the data cleaning steps like removing the stop words and implementing tokenization, lemmatization and stemming. I implemented a BERT model to build a general-purpose text feature extractor. Then used a k-NN classification model to find topics which are similar to the given query. Thus, developed a topic recommendation system which suggests similar topics.
Meetings Attended:
Attended all team meetings