Things Learned: Technical Area: Better understanding of Beautiful Soup documentation. Worked with it closely. Got a better understanding of GitHub through the videos. Also, learned about using python to train a logistic regression model to classify into positive/negative sentiment.
A text editor—I use Atom Editor. If this is a problem, please let me know.
Installed scrapy, but didn’t do much with it. Selenium/chomedriver (didn’t do much with it). nltk.
Soft Skills Best soft skill I got was learning how to use different resources across the web to learn about a specific skill I wanted (in this case, using logistic regression to classify +/- sentiment). The process of learning this was quite helpful. Got a better understanding of a broader overview of ML from the Kunal Sing web video: learned about content vs. collaborative, unsupervised vs supervised, etc. From the Sara EL_ATEIF web video, got a better understanding of a specific application of ML through using the example of sorting IMDB movie. Also got a general understanding of NLP. Finally, watched the Maleeha Imran video for a more specific knowledge on crawling + scraping and the differences between the two. Also looked at specific code there as well.
3 Achievement highlights
- Was able to download Beautiful Soup and become familiar with the documentation. Also watched the videos under the resources tab to obtain the necessary background information. The web videos were very helpful.
- Successfully scraped from this website: Folksy 365 - Daily Listing Challenge Thread January 2021 - Showcase - Folksy Forums. This website is basically a discussion post, where people can respond. I was able to use a soup object to represent the website.
- I was able to take the text of each post, and print it out in an easy, presentable way. I was also able to take the text of each post and make them each an entry of the list in python. Basically, I have made a list of the text entries of each post. Now, I can do a number of different NLP operations on the words in each post.
- I also downloaded movie review data write code in python, and use logistic regression to classify into positive and negative sentiment. The accuracy ratio was not great (78% or so) but I used a very basic and imprecise way of doing logistic regression, and I will continue to work on ways to improve that. My main focus was just getting my feet wet in ML. But this did give me a good refresher on writing python, learning about ML/logistic regression, and learning about nltk (which I imagine could be helpful over the summer).
- Prepared workspace by installing all minimum necessary technical requirements (i.e Beautiful Soup, etc.). Successfully installed Selenium and chromedriver as well.
- Watched the webinars and learned a lot (I explained what I learned in more detail above).
- I familiarized myself with scraping using the link mentioned above. I was able to extract just the text from each post, but I didn’t do much to process it. I mostly just played around with scraping. Tell me if you would like anything in particular with regards to the processing.
- Read some basic things regarding Pytorch, but I plan to study it more. Also, as mentioned above, wrote python to train a logistic regression model. I have certain questions about improving the logistic regression model, but I think I will get the answer by learning by doing/ searching online.