Understanding the structure of a web page, inspecting web elements to understand the distribution of data among different HTML tags, making HTTP requests, and parsing HTML responses. Working with libraries: Beautiful Soup, Scrapy to build Web Scrapers, Web Crawlers to scrape data from discourse forums. Learned to work with text data, basics of attention-based models, transformer networks, and BERT.
Learnt basics of version control, Git, and its functionalities working with VS Code, basics of coding in PyTorch.
Learned effective ways of communicating an idea, collaborating with people with varied backgrounds, the importance of considering different opinions, and reaching a common ground.
- Built a Web Crawler to go through various pages of Discourse form of my choice (Choice Community Forum).
- Scraped data from various web pages and stored them as pickle files.
- Submitted all bi-weekly reports on-time.
Attended Colin’s Webinar on Git, Bi-weekly team meetings.
Goals for the upcoming week
Working with pre-trained BERT models and developing a basic recommendation system.
Scraping the main page of the forum to get slugs and ids of different pages. Crawling through the pages of the forum through URLs obtained and scraping data in as refined format as possible through scraper developed in python using the Beautiful Soup library. Guidance by the project lead on inspecting web elements was helpful in developing the scraper and crawler. Explored git functionalities and Colin’s webinar was instrumental in understanding Git and the importance of version control.
I was very comfortable with the pace and didn’t face any challenges other than minor issues while working with web pages.