**Concise overview of things learned:**

**Technical area:**

• Set up the Google Colab Environment

• Learned Basics of Machine learning

• Learned Definition, Classification, pros & cons of different recommender systems

• Learned Web Scrapping

• Learned Basics of NLP

• Learned how to train Logistic regression model & the intuition behind it

• **Tools:**

Beautiful Soap, Selenium webdriver, Scrapy, Git, Github, Trello

• **Softskills:**

Learned about Project Management & improved my google searching skills to cope up with bugs.

**Achievement highlights:**

- Created site maps to design the path to take in order to scrap desired data
- Learned to use Beautiful Soap Library & after inspecting scrapped data from different tags of the HTML page of the DiscourseHub Community forums.
- Learned to use Selenium webdriver & Incorporated it with Beautiful Soap library.
- Trained a simple logistic regression model of social network ads. It predicts whether people with certain age & income will buy a car or not.
- Familiarised myself with git commands, git-clone, commit, merge, push etc

**Detailed Statement of Tasks Completed:**

**Recommendation Systems:**

- Content Based Filtering: recommends similar items based on previous action or feedback(like, rating etc)

Measures of similarity: finding cosine, dot-product, Eucledian distance

problems can be solved by 2 ways:

1)Classification: predict ‘like’ or ‘dislike’ here we use metrics like accuracy, precision or recall

2)Regression: predict rating given by user here we use MSE(mean squared error )metric For relevancy of new recommendation model we need to test in real conditions

• Collaborative Filtering: 2 types

1)Model based: Model is defined based on user item interactions where users and items representations are to be learned from interaction matrix

- Memory based: no model is defined but it depends on similarities between users or items in terms of observed interactions

• Learned the recommendation algorithm & step by step tasks behind Content Based recommendation system.

• K-Nearest Neighbours: it’s an algorithm to find K nearest neighbours of the input which are in n dimensional space based on a distance metrics.

• Basics of NLP:

Because of the problems of representing Language as ML input Vanilla

Neural Network came. But If we supply linguistic features ,we don’t get deeper

context surrounding the individual words or tokens because they don’t take

sequential information. So RNN(Recurrent Neural Network) came. But here

words only can be read from one direction. So LSTM(Long Short Term

Memory Network) came.

Learned Attention Model & it’s uses. Learned Bert ,it’s training process

- masking 2) Next sentence Predictions: not useful for sentiment analysis

Logistic Regression:

Steps:

1.Importing the libraries-(numpy,pandas,matplotlib)

2.Importing the dataset- csv file

3.Splitting the dataset into the Training set and Test set

- Feature Scaling
- Training the Logistic Regression model on the Training set
- Predicting a new result
- Predicting the Test set results
- Making the Confusion Matrix
- Visualising the Training set results
- Visualising the Test set results