==> Technical Areas worked on:
- I choose the revolut Discourse community forum to practice what I learned from recent scrapping tutorials -Used BeautifulSoupe,CSV, and requests library to scrap and save the data from the forum into a CSV file. -Inspired by the google course project shown in one of the tutorials, I tried to categorize simmilar issues in the issues sub-forum based on the count or frequency of words in issue description or title. -Currently, I’m working on how to train the model based on that data(still in progress).
==> Tools and Libraries: -BeautifulSoup: Learned how to use the library to scrape data from a certain web page. -requests: Learned how to send requests and process status and content of the request. -Scrapy: Learned how to use the framework to crawl multiple webpages and sub-pages. -Python: Refreshed my memory on Python. -Jira: We use Jira to divide the work among team members.
Soft-skills:
- Becoming more familiar with the new platforms we use including: STEM-Away, Discord, Jira, Github, Jupyter Notebooks.
- Getting to know other members of the team and try to establish a productive way of communication
==> Achievment Highlights :
- Learning how to scrape data from the Internet.
- Getting familiar with important tools such as: Python libraries, Github, Jupyter Notebooks.
- Learning data pre-processing and cleaning raw data to be ready to be run by the model.
==> Tasks Completed:
- Scraped data from the revolut Discourse forum.
- Tried different data-cleaning techniques using Pandas and NumPy
==> Future Goals
- Learn how to apply ML models on data.