Deepikarana - Machine Learning (Level 1) Pathway

Module 1 Self Assessment

Week 1 and 2: 28th June 2021

Overview of Things Learned:

Technical Area:

  • I have past experience in machine learning so brushed up my concepts once again via the resources.
  • Talking specifically about the NLP part, I went through the mentors webinars apart from the links
    • I can comfortably use the ML project workflow
  • I have prior experience in ML so this module helped me revise the concepts like Word Embedding, Logistic Regression etc. again in depth

Tools:

  • Learned the basics of parsing HTML text using Beautiful Soup and Selenium WebDriver
  • I have experience with Google Colab and Jupyter Notebook but the data is huge so I would be using Google Colab for the further modules
  • Learned how to use the python library Spacy, this was a new library I came across.
  • Sentence_transformers and Transformers helped me use BERT encoding in python
  • I have knowledge of basics of Pytorch as well as TensorFlow 2.0 but revised again and that really helped.

Soft Skills:

  • Being the project Lead I had meeting with the leads and we discussed about the plan for our team and we go about it
    • Setup the Discord server for communication for the team
  • Tried to have a conversation with all the participants apart from the leads as well.
  • Planned the first meeting with the participants.
  • Going to host the meeting and explain about the modules, self assessments, deadlines in this meeting :ballot_box_with_check:

Achievements Highlights:

  • Gained knowledge of more Python libraries
  • Successfully scraped some web pages and got data from them
  • Got familiar about how NLP is used nowadays, and how to use it in making projects

Meetings attended

  1. Leads only meeting with 3 fellow leads and planned the strategy for the upcoming week and rough outline on how things would go
  2. ML Level 1 with Sara along with fellow Project Lead and got insights on how we can progress further as a team and got my queries resolved.
  3. Team Meeting ML Level 1 with the all the participants with an icebreaking session
  4. Team Meeting 2 with the complete team and described the module 2

Goals for the upcoming week:

  • Module 2 resources and tutorials to be started
  • Choosing the forum to be worked upon as a team
  • Attending 5-10 minute scrum meetings to check in on team progress.

Tasks Completed:

  • Successfully scraped a webpage for data
    • Using Beautiful Soup
    • Used Selenium for the first time so a bit difficult.
  • Used Transformers in python in order to convert text into either negative or positive
    • Used the logistic regression machine learning model in order to train this program
  • Hosted the 2 weekly meeting with the fellow leads and introduced the participants to the modules
1 Like

Module 2 Self Assessment

Week 3 and 4: 12th July 2021

Overview of Things Learned:

Technical Area:

  • Familiarized myself with fundamentals of EDA
  • Learned how to navigate HTML and extract certain components of the website
  • Got more practice with scraping - extract text and dates
  • Basic Clean the Data and Analyze the data
  • I have chosen Pytorch to scrape data.
  • I Scraped the data from these forum using Beautiful Soup & Selenium library and stored the data in a csv file.
  • I have used different Data cleaning and EDA techniques to explore the scraped data.
  • Understood the logic behind recommender systems and ML algorithms

Tools:

  • Beautiful Soup , Selenium Webdriver, Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, CSV, Requests
  • Git / GitHub
  • STEM-Away Platform, Discord, Jira
  • Jupyter Notebook, Google Colaboratory

Soft Skills:

  • Communication with participants and Interleads of other teams
  • Attended The Office Hours updated Sara with the team’s progress and our future goals and deadlines
  • Going to host the meeting and explain about the modules 3

Achievement highlights

  • Developed a good understanding of the web crawler.
  • Experimented with beautiful soup with HTML structure using the colab environment. Learned how to perform EDA and data cleaning before the machine learning process.
  • Successfully scraped the data from Pytorch forum using Beautiful Soup & Selenium Library
  • Learned the basics of how to scrape a website and ways data can be formatted

Meetings attended

  1. Attended the Lead Help session to provide a status update on your team
  2. ML Level 1 - Sara along with fellow Project Lead
  3. Team Meeting 3 with the complete team
  4. Team Meeting 4 which was a Game Night

Goals for the Upcoming Week

  • Experiment with the dataset with basic machine learning models to see the classification results.
  • Module 3 resources and tutorials to be started

Tasks Completed

  • Set up the Git repo for the team and invited all the participants to contribute.
  • Got thoroughly familiar with the Discourse platform. Chose the forum I will working on i.e Pytorch forum
  • Learn how to write the data crawled from the website to CSV file.
  • Scraped the messages from one of the boards into a CSV and removed the HTML tags
  • Done with basic data analysis and visualization

Module 2 code submission - ml-session2-team5/DeepikaRana at Module2 · mentorchains/ml-session2-team5 · GitHub

Module 3 Self Assessment

Week 5 and 6: 26th July 2021- 6th August 2021

Overview of Things Learned:

Technical Area:

  • I learned about some Basic Machine Learning models like Naive Bayes , Linear SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost,LightGBM

  • I also incorporated Cross Validation with Linear SVM,Random Forest, XGBoost,LightGBM model to generate the results.

  • Build the Machine Learning model & pipeline and used doc2vec,tf-idf embeddings with 3 machine learning models(Random Forest, XGBoost & logistic regression) [found that these 3 models are giving better accuracy.]

  • Ran the TF-IDF text embedding with different models and achieved higher accuracy for logistic regression pushed results on the Github repo

Tools:

  • Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, CSV, Git / GitHub, Google Colaboratory

Soft Skills:

  • Communication with participants and fellow leads to plan on the team presentation
  • Attended a meeting with the mentor Anubhav asked queries about the data imbalancing
  • Going to co-host the presentation coming tuesday so planned for that.
  • Prepared the basic ppt presentation for the ML Level 1 Team presentation

Achievement highlights

  • I experimented the dataset with basic machine learning models( Naive Bayes , Linear SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost,LightGBM) to see the classification results.

  • I used tf-idf,doc2vec embeddings with 3 machine learning models(Random Forest, XGBoost & logistic regression) to see the classification results.

  • Learned the process of the BERT neural architecture read papers and understood the theoretical foundations of the BERT neural architecture.

  • Successfully learned how to perform text embedding along with machine learning models.

  • Learning deep learning frameworks and applied them to this classifier project.

Meetings attended

  1. Attended the Mentor’s meeting ML Level 1 with Anubhav to discuss and resolve queries regarding the final project
  2. Team Meeting 5 with the complete team to discuss about Module 3
  3. Team Meeting 6 for discussing the results of our models
  4. Team Meeting 7 for planning of the team presentation.

Goals for the Upcoming Week

  • Experiment with the dataset to achieve higher accuracy and try more classification models
  • Deploy the Pytorch Forum Classification System as a Web Application.
  • Module 4 resources and tutorials to be started

Tasks Done

  • Improved data imbalance in the categories by using the top 15 categories for the classifier system
  • Achieved a accuracy of 64% using the Linear SVM without any hyper parameter tuning
  • Basic Modeling and Advanced Embedding methods applied on the model and the corresponding notebooks pushed to the Team Repo.

Module 3 Code Submission - ml-session2-team5/DeepikaRana at Module3 · mentorchains/ml-session2-team5 · GitHub

Module 4 Self Assessment

Week 7 and 8: 7th August 2021- 20th August 2021

Overview of Things Learned

  • I learned about how to train BERT, XLNet, roBERTa, distilbert models using the Simple Transformers library.
  • I learned about how to combine an advanced model like BERT and a simple ML model like Logistic Regression.
  • Learned the process of the BERT neural architecture, read papers and understood the theoretical foundations of the BERT neural architecture.
  • Read articles on the optimised versions of BERT to know more details of which would be more suitable for our Classifier for Pytorch Forum.

Tools:

  • simple transformers, tokenizers==0.9.4,sklearn,tarfile,html,css, Flask api, Docker
  • Google Colaboratory

Soft Skills:

  • Presented the team presentation for ML Level 1 Team 5 with the complete team
  • Improved the earlier presentation based on the feedback from Debaleena and Mentors.
  • Communication with participants and fellow leads to plan on the team final presentation
  • Going to co-host the final presentation on Friday so planned for that.

Achievement highlights

  • Learned the process of the BERT neural architecture, read papers and understood the theoretical foundations of the BERT neural architecture.
  • Read articles on the optimised versions of BERT to know more details of which would be more suitable for our Classifier for Pytorch Forum.
  • I successfully trained BERT,Roberta,xlnet,Distilbert models using the Simple Transformers library.

Meetings attended

  1. ML Level 1 Presentation for our Team 5
  2. Team Meeting 8 with the complete team to discuss about Module 4
  3. Team Meeting 9 for planning of the team final presentation.

Goals for the Upcoming Week

  • Present the work and the final presentation to the mentors with the team

Tasks Done

  • Compared the Accuracy,Evaluation_loss,F1_Score,MCC of the 4 advanced models.
  • Improved data imbalance in the categories by using the top 15 categories for the classifier system
  • Achieved a highest accuracy of 80.5 % using the XLNet

Module 4 Code Submission - ml-session2-team5/DeepikaRana at Module4 · mentorchains/ml-session2-team5 · GitHub