Thank you @maleeha_imran for an amazing webinar! We will have the recording up shortly. Meanwhile, here are a few screenshots for those of you who are super keen to try out the steps demonstrated in the webinar!
- What is Web Crawling? Gathering data from a website
- Python Basics - installation, variables, understanding HTML structure
- Intro to Web requests/responses - urllib(Python library for web requests)
- Building a basic crawler - scrapy ( a free/open source Python web crawling framework), Beautifulsoup(pulling data out of HTML/XML files).
- Crawling Tutorial
First pass editing done. Will be uploaded to STEMCasts library shortly.
The primary ways to get data:
What are you allowed to crawl? Honor the rules in Robots.txt.
Start with sites created specifically to help you learn. Example: quotes.toscrape.com
BeautifulSoup Code to sort and print tags from quotes.toscrape.com:
Tool built specifically for scraping: