Things learned from Module 1:
- Technical Area
Web scraping that contains xml files.
Learned more about what distributional semantics is and why that can be useful.
- Tools
BeautifulSoup
pandas
Jupyter Notebook
Spacetime - get to know each other’s schedule even when in very different time zones
- Soft Skills
Learned the concept of journal club and what I need to pay attention to when reading a research paper.
To communicate and work with team members from around the world.
- Achievements
Read through the paper, understand the overall workflow and the significance of what we will be doing.
Use BeautifulSoup to access the Medline abstract data we need for future tasks.
Learn about various information that can be shown from dependency parsing and try using Stanford Parser with simple sentences.
- Goals for upcoming week
Get the list of all drug/target names, then combine with Stanford Parser to make the dependency matrix.
- Task done
Scraped a file from Medline, parsed and saved the abstracts in it. Plan to just focus on one file for now to simplify the data preprocessing.