Concise Overview of Things Learned: Technical Area:
- I have chosen Codeacademy forum to scrape data.
- I Scraped the data from these forum using Beautiful Soup & Selenium library and stored the data in a csv file.
- I have used different Data cleaning and EDA techniques to explore the scraped data.
Tools: Beautiful Soup , Selenium Webdriver, Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, TextBlob
Softskills: I Improved my google searching skills to cope up with bugs.
- Successfully scraped the data from Codeacademy forum using Beautiful Soup & Selenium Library
- Successfully executed Exploratory Data Analysis to explore the scraped data.
- Successfully pushed the files to Github Repository.
Detailed Statements of task:
- I was unable to scrape comments of web page so I used Selenium for Scrolling purpose.
- I lowercased words, removed digits and words containing digits ,cleared punctuation, Removing extra spaces, removed common & rare words, lemmatized words
- I created document-term matrix using Scikit-learn’s CountVectorizer to find the top words of every category.
- I generated a word-cloud of top words of each category.
- I used TextBlob to check polarity of each category.
- I used “C:\Program Files (x86)\chromedriver.exe” in Jupyter notebook ,it worked well but when I tried to use it in google colaboratory an error message was showing “Chromedriver not in path”.