jdutta
2
Self-Assessment for Module 2
Technical Area:
- Reviewed Python, followed Module 2 tutorials
- Explored the PyTorch forum website
- Learned how to extract data from a website
- Scraped data from PyTorch forum (storing attributes in a CSV file)
- Performed basic data cleaning and EDA
Tools:
- Selenium/Firefox Webdriver (geckodriver)
- Visual Studio Code
- pandas, beautifulsoup4
- Python
- HTML
- Github
- Jira
Soft Skills:
- Communication with team on Discord
- Attended team meetings and office hours
Achievements:
- Learned how to inspect elements of webpages
- Learned how data can be extracted from a website
- Learned about storing data attributes into a CSV file
- Was able to scrape thousands of posts from the PyTorch Forums (and am currently in the process of cleaning and analyzing the data)
jdutta
3
Self-Assessment for Module 3
Technical Area:
- Followed module 3 tutorials
- Performed basic data cleaning and EDA on the data
- Trained basic ML models (Naive Bayes Classifier, Linear Support Vector Machine, Logistic Regression, and Decision Tree), analyzing their accuracy
- Trained ensemble ML models (LogisticRegression, RandomForest, and XGBoost)
Tools:
- Jupyter Notebook
- Python
- pandas, matplotlib, numpy
- Github
- Jira
Soft Skills:
- Communication with team on Discord
- Attended team meetings
Achievements:
- Was able to perform data cleaning and some basic visualizations of the data scraped from the PyTorch models
- Learned the basics of pandas Dataframes
- Learned how to train basic and ensemble ML models
jdutta
4
Self-Assessment for Module 4
Technical Area:
- Followed module 4 tutorials
- Watched videos from module 1 about NLP concepts
- Learned how to train BERT, RoBERTa, DistilBERT, and XLNet models using the Simple Transformers library
- Learned how to combine an advanced model (BERT) with a simple model (Logistic Regression)
- Learned the concepts of building and dockerizing a web application
Tools:
- Jupyter Notebooks (local)
- Google Colab (for running advanced models on a GPU)
- Python
- Python libraries: pandas, sklearn, Simple Transformers, Tokenizers, Re, tarfile
- Discord
- STEM-Away Platform
Soft Skills:
- Delivered two presentations (including final presentation)
- Communication with team on Discord
- Attended team meetings
Achievements:
- Learned how to train advanced ML models
- Learned how to build and dockerize a web application