Concise overview of things learned:
Deeply go through the 2015 paper to understand the request and deliverable.
- Learning how text in specific field needed to use specific parser to gather relevant information. In this case, we use pubmed parser to parse the abstracts from medline data.
Request Library, Pubmed_Parser, Pandas
- Icebreaking and got familiar with teams. Understand the capacity and strength of each member to split task efficiently.
- Learned the importance of breaking a task down to smaller parts.
- Learned to use tool such as journal club to have a kind of comprehensive understanding of the task.
- Successfully split the “10 prompts” to our team and get everyone on the same page after journal club.
- Presented and explained my understanding of part of the 2015 paper
- Installed and used Pubmed parser to extract abstracts from medline data.
Detailed Statement of Tasks Completed:
- To achieve the goal at the end, we started by explore a single document from medline database. We use the request package to gather one file out of the medline database and use the pubmed parser to extract the abstracts from that original files and convert it into a pandas dataframe to clean the data. On the cleaning stage, we simply eliminate rows that is empty and also eliminate other irrelevant information.
Goals for the upcoming week:
- Filter the data with known drug, gene name. Then parse the dependency path by using Standford Parser.