Concise overview of things learned:
• Set up the Google Colab Environment
• Learned Basics of Machine learning
• Learned Definition, Classification, pros & cons of different recommender systems
• Learned Web Scrapping
• Learned Basics of NLP
• Learned how to train Logistic regression model & the intuition behind it
Beautiful Soap, Selenium webdriver, Scrapy, Git, Github, Trello
Learned about Project Management & improved my google searching skills to cope up with bugs.
- Created site maps to design the path to take in order to scrap desired data
- Learned to use Beautiful Soap Library & after inspecting scrapped data from different tags of the HTML page of the DiscourseHub Community forums.
- Learned to use Selenium webdriver & Incorporated it with Beautiful Soap library.
- Trained a simple logistic regression model of social network ads. It predicts whether people with certain age & income will buy a car or not.
- Familiarised myself with git commands, git-clone, commit, merge, push etc
Detailed Statement of Tasks Completed:
- Content Based Filtering: recommends similar items based on previous action or feedback(like, rating etc)
Measures of similarity: finding cosine, dot-product, Eucledian distance
problems can be solved by 2 ways:
1)Classification: predict ‘like’ or ‘dislike’ here we use metrics like accuracy, precision or recall
2)Regression: predict rating given by user here we use MSE(mean squared error )metric For relevancy of new recommendation model we need to test in real conditions
• Collaborative Filtering: 2 types
1)Model based: Model is defined based on user item interactions where users and items representations are to be learned from interaction matrix
- Memory based: no model is defined but it depends on similarities between users or items in terms of observed interactions
• Learned the recommendation algorithm & step by step tasks behind Content Based recommendation system.
• K-Nearest Neighbours: it’s an algorithm to find K nearest neighbours of the input which are in n dimensional space based on a distance metrics.
• Basics of NLP:
Because of the problems of representing Language as ML input Vanilla
Neural Network came. But If we supply linguistic features ,we don’t get deeper
context surrounding the individual words or tokens because they don’t take
sequential information. So RNN(Recurrent Neural Network) came. But here
words only can be read from one direction. So LSTM(Long Short Term
Memory Network) came.
Learned Attention Model & it’s uses. Learned Bert ,it’s training process
- masking 2) Next sentence Predictions: not useful for sentiment analysis
1.Importing the libraries-(numpy,pandas,matplotlib)
2.Importing the dataset- csv file
3.Splitting the dataset into the Training set and Test set
- Feature Scaling
- Training the Logistic Regression model on the Training set
- Predicting a new result
- Predicting the Test set results
- Making the Confusion Matrix
- Visualising the Training set results
- Visualising the Test set results