I have been introduced to the idea of a journal club. Which is a discussion about a scientific paper in which each individual is responsible for analyzing the paper from a different perspective. My team and I had our journal club and it was very informative and engaging.
Soft Skills:
I was able to lead 3 meetings this week and get the team members to participate and share their ideas confidently.
I made sure to appreciate the efforts of the team members, people tend to do better when appreciated.
I tried to get everyone engaged and encouraged the team spirit, so whenever we have a lot of tasks to do, I delegate some tasks to the team members.
I tried to handle all the technical issues and report them as needed. the on-boarding process is usually not easy.
I have learned how to use the pubmed parser to parse the Medline XML files in the form of python data frames in preparation for further preprocessing.
Soft skills:
The importance of task delegation among team members. When splitting the work into simple tasks and divide them among us, we were able to get things done quickly and efficiently.
Three Achievement Highlights:
Being able to learn new skills quickly
Ability to work with data without prior knowledge about the theoretical details of them.
I was able to hold weekly meetings for my team and got everyone to participate and share their progress
Goals for The Upcoming Week:
Extract entities from the sentences using Stanford parser
Tasks Done:
Learning about the format of Medline publications
learning to use Pubmed Parser
writing optimized and generic code to perform the parsing
I have filtered the publications according to which ones contain abstracts; because I only need to process the abstracts
I have learned to use string matching to extract the sentences that contain drug-gene pairs to be input to the stanford parser
I have learned how to use the Stanford parser to .extract the dependency paths of the drug-gene pairs.
Soft skills:
It’s okay to ask for help when you are stuck on something. This is the importance of working in teams and specially with people from different backgrounds
Three Achievement Highlights:
Collaboration with other teams
Still maintaining the team spirit
Documenting progress
Goals for The Upcoming Week:
Building a pipeline to process the entire database in parallel using Dask
I have been introduced to Dask library, which is a library that is used for parallel processing of data
we used Dask bags to create a pipeline that can take data files in chunks and parse them; this way we will not have to store all the data in the RAM
I learned how to use an AWS cluster and how you can use jupyter notebooks in it, we transferred everything into AWS cloud
I used the parallel processing pipeline to extract the final dependency matrix; which consists of rows of drug-gene pairs and columns of dependency paths
Soft skills:
I got used to my team members and we have friendly relationships now, which is very important for motivation and encouragement
Three Achievement Highlights:
Learned about a new library (Dask)
Learned about AWS clusters
A Concrete team bond
Goals for The Upcoming Week:
Passing the processed data into the EBC algorithm, the core machine learning part of the project
I have gone through the paper thoroughly to understand the theoretical concepts of the Ensemble Biclustering Algorithm
I have learned that the algorithm has two steps (supervised, unsupervised) and decided to dedicate this week to the first one, i.e. the unsupervised step
I have used the ITCC algorithm to co-cluster each drug-gene pair and run it 100 times, the result is a co-occurrence matrix with rows and columns of drug-gene pairs, and values which correspond to how often two pairs are clustered in the same cluster
Soft skills:
I experienced the importance of sharing knowledge between team members, and having people from different experience levels help each other
I also learned the importance of documenting work, both for team member and for external audience