Week 1
Technical Area
-
Refamilarized with pandas library, mainly dataframe and associated utilities
-
Refamilarized with Regular Expression library
-
Learned Beautiful Soup, Pubmed_parser, and Requests libraries
Tools
-
Jupyer Notebook (via Google Colab)
-
Magic commands (like %time and %pip)
-
Beautiful Soup library
-
Pubmed_parser library
-
Regular expression library
-
Requests library
Soft Skills
Facilitated international team meetings and answered general questions
Achievements Highlights
-
Successfully finish the web crawler and scraped all data
-
Review research paper and attached code
-
Scraped all Medline data but not time efficient (~7 hours to parse and process)
Upcoming Goals
- Data cleaning
- Feature Engineering
- Stanford parser
- Dependency matrix
Tasks
Used web crawler to scrape required data from Medline database website and created csv file to hold abstract data.