Deepikarana - Machine Learning (Level 1) Pathway

deepikarana · July 6, 2021, 3:45pm

Module 1 Self Assessment

Week 1 and 2: 28th June 2021

Overview of Things Learned:

Technical Area:

I have past experience in machine learning so brushed up my concepts once again via the resources.
Talking specifically about the NLP part, I went through the mentors webinars apart from the links
- I can comfortably use the ML project workflow
I have prior experience in ML so this module helped me revise the concepts like Word Embedding, Logistic Regression etc. again in depth

Tools:

Learned the basics of parsing HTML text using Beautiful Soup and Selenium WebDriver
I have experience with Google Colab and Jupyter Notebook but the data is huge so I would be using Google Colab for the further modules
Learned how to use the python library Spacy, this was a new library I came across.
Sentence_transformers and Transformers helped me use BERT encoding in python
I have knowledge of basics of Pytorch as well as TensorFlow 2.0 but revised again and that really helped.

Soft Skills:

Being the project Lead I had meeting with the leads and we discussed about the plan for our team and we go about it
- Setup the Discord server for communication for the team
Tried to have a conversation with all the participants apart from the leads as well.
Planned the first meeting with the participants.
Going to host the meeting and explain about the modules, self assessments, deadlines in this meeting

Achievements Highlights:

Gained knowledge of more Python libraries
Successfully scraped some web pages and got data from them
Got familiar about how NLP is used nowadays, and how to use it in making projects

Meetings attended

Leads only meeting with 3 fellow leads and planned the strategy for the upcoming week and rough outline on how things would go
ML Level 1 with Sara along with fellow Project Lead and got insights on how we can progress further as a team and got my queries resolved.
Team Meeting ML Level 1 with the all the participants with an icebreaking session
Team Meeting 2 with the complete team and described the module 2

Goals for the upcoming week:

Module 2 resources and tutorials to be started
Choosing the forum to be worked upon as a team
Attending 5-10 minute scrum meetings to check in on team progress.

Tasks Completed:

Successfully scraped a webpage for data
- Using Beautiful Soup
- Used Selenium for the first time so a bit difficult.
Used Transformers in python in order to convert text into either negative or positive
- Used the logistic regression machine learning model in order to train this program
Hosted the 2 weekly meeting with the fellow leads and introduced the participants to the modules

deepikarana · July 23, 2021, 11:11am

Module 2 Self Assessment

Week 3 and 4: 12th July 2021

Overview of Things Learned:

Technical Area:

Familiarized myself with fundamentals of EDA
Learned how to navigate HTML and extract certain components of the website
Got more practice with scraping - extract text and dates
Basic Clean the Data and Analyze the data
I have chosen Pytorch to scrape data.
I Scraped the data from these forum using Beautiful Soup & Selenium library and stored the data in a csv file.
I have used different Data cleaning and EDA techniques to explore the scraped data.
Understood the logic behind recommender systems and ML algorithms

Tools:

Beautiful Soup , Selenium Webdriver, Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, CSV, Requests
Git / GitHub
STEM-Away Platform, Discord, Jira
Jupyter Notebook, Google Colaboratory

Soft Skills:

Communication with participants and Interleads of other teams
Attended The Office Hours updated Sara with the team’s progress and our future goals and deadlines
Going to host the meeting and explain about the modules 3

Achievement highlights

Developed a good understanding of the web crawler.
Experimented with beautiful soup with HTML structure using the colab environment. Learned how to perform EDA and data cleaning before the machine learning process.
Successfully scraped the data from Pytorch forum using Beautiful Soup & Selenium Library
Learned the basics of how to scrape a website and ways data can be formatted

Meetings attended

Attended the Lead Help session to provide a status update on your team
ML Level 1 - Sara along with fellow Project Lead
Team Meeting 3 with the complete team
Team Meeting 4 which was a Game Night

Goals for the Upcoming Week

Experiment with the dataset with basic machine learning models to see the classification results.
Module 3 resources and tutorials to be started

Tasks Completed

Set up the Git repo for the team and invited all the participants to contribute.
Got thoroughly familiar with the Discourse platform. Chose the forum I will working on i.e Pytorch forum
Learn how to write the data crawled from the website to CSV file.
Scraped the messages from one of the boards into a CSV and removed the HTML tags
Done with basic data analysis and visualization

Module 2 code submission - ml-session2-team5/DeepikaRana at Module2 · mentorchains/ml-session2-team5 · GitHub

deepikarana · August 7, 2021, 8:01am

Module 3 Self Assessment

Week 5 and 6: 26th July 2021- 6th August 2021

Overview of Things Learned:

Technical Area:

I learned about some Basic Machine Learning models like Naive Bayes , Linear SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost,LightGBM
I also incorporated Cross Validation with Linear SVM,Random Forest, XGBoost,LightGBM model to generate the results.
Build the Machine Learning model & pipeline and used doc2vec,tf-idf embeddings with 3 machine learning models(Random Forest, XGBoost & logistic regression) [found that these 3 models are giving better accuracy.]
Ran the TF-IDF text embedding with different models and achieved higher accuracy for logistic regression pushed results on the Github repo

Tools:

Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, CSV, Git / GitHub, Google Colaboratory

Soft Skills:

Communication with participants and fellow leads to plan on the team presentation
Attended a meeting with the mentor Anubhav asked queries about the data imbalancing
Going to co-host the presentation coming tuesday so planned for that.
Prepared the basic ppt presentation for the ML Level 1 Team presentation

Achievement highlights

I experimented the dataset with basic machine learning models( Naive Bayes , Linear SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost,LightGBM) to see the classification results.
I used tf-idf,doc2vec embeddings with 3 machine learning models(Random Forest, XGBoost & logistic regression) to see the classification results.
Learned the process of the BERT neural architecture read papers and understood the theoretical foundations of the BERT neural architecture.
Successfully learned how to perform text embedding along with machine learning models.
Learning deep learning frameworks and applied them to this classifier project.

Meetings attended

Attended the Mentor’s meeting ML Level 1 with Anubhav to discuss and resolve queries regarding the final project
Team Meeting 5 with the complete team to discuss about Module 3
Team Meeting 6 for discussing the results of our models
Team Meeting 7 for planning of the team presentation.

Goals for the Upcoming Week

Experiment with the dataset to achieve higher accuracy and try more classification models
Deploy the Pytorch Forum Classification System as a Web Application.
Module 4 resources and tutorials to be started

Tasks Done

Improved data imbalance in the categories by using the top 15 categories for the classifier system
Achieved a accuracy of 64% using the Linear SVM without any hyper parameter tuning
Basic Modeling and Advanced Embedding methods applied on the model and the corresponding notebooks pushed to the Team Repo.

Module 3 Code Submission - ml-session2-team5/DeepikaRana at Module3 · mentorchains/ml-session2-team5 · GitHub

deepikarana · August 19, 2021, 7:03pm

Module 4 Self Assessment

Week 7 and 8: 7th August 2021- 20th August 2021

Overview of Things Learned

I learned about how to train BERT, XLNet, roBERTa, distilbert models using the Simple Transformers library.
I learned about how to combine an advanced model like BERT and a simple ML model like Logistic Regression.
Learned the process of the BERT neural architecture, read papers and understood the theoretical foundations of the BERT neural architecture.
Read articles on the optimised versions of BERT to know more details of which would be more suitable for our Classifier for Pytorch Forum.

Tools:

simple transformers, tokenizers==0.9.4,sklearn,tarfile,html,css, Flask api, Docker
Google Colaboratory

Soft Skills:

Presented the team presentation for ML Level 1 Team 5 with the complete team
Improved the earlier presentation based on the feedback from Debaleena and Mentors.
Communication with participants and fellow leads to plan on the team final presentation
Going to co-host the final presentation on Friday so planned for that.

Achievement highlights

Learned the process of the BERT neural architecture, read papers and understood the theoretical foundations of the BERT neural architecture.
Read articles on the optimised versions of BERT to know more details of which would be more suitable for our Classifier for Pytorch Forum.
I successfully trained BERT,Roberta,xlnet,Distilbert models using the Simple Transformers library.

Meetings attended

ML Level 1 Presentation for our Team 5
Team Meeting 8 with the complete team to discuss about Module 4
Team Meeting 9 for planning of the team final presentation.

Goals for the Upcoming Week

Present the work and the final presentation to the mentors with the team

Tasks Done

Compared the Accuracy,Evaluation_loss,F1_Score,MCC of the 4 advanced models.
Improved data imbalance in the categories by using the top 15 categories for the classifier system
Achieved a highest accuracy of 80.5 % using the XLNet

Module 4 Code Submission - ml-session2-team5/DeepikaRana at Module4 · mentorchains/ml-session2-team5 · GitHub