Overview of things that I learned
Technical Area:
- Learned more about overview concepts of Machine learning and some machine learning algorithms by watching the lecture videos
- Learned more about fundamental ideas about NLP and some networks like Distributional Semantics,Linguistics problem,EBC etc.
- Learning about web scraping and data mining
Tools:
- Natural Language Processing(NLP)
- Bioinformatics
- EBC
- Medline
Soft Skills:
- Being more prepared for machine learning and NLP as a whole
- Get more familiar with how to explore Machine Learning in the field of Bioinformatics
- Medline: learned the database of Medline and how to extract data from it.
- EBC: Learned what Ensemble Biclustering for Classification (EBC) and hierarchical clustering algorithms
Achievements and tasks:
- Learned about concepts of Machine Learning and web scraping
- Built virtual environment
- Became more familiar with machine learning,NLP ,Bioinformatics, Biomedical field
- Read the research paper and made journal tasks based on that paper
Module 1 - Overview:
-
Technical skills:
- Prsed the raw data from Medline by Pubmed parser
- Understand how to use the Stanford Parser.
- Read and understand given scientific papers
- learned more foundational knowledge of Dependency Parsing
- Used Dependency parser using java
-
Tools/Libraries:
- Java: Downloaded and implemented it with parsing the .txt file.
- VS Code: Installed it and tried to understand how to use it.
- Stanford Dependency parser
- Successfully installed jython2.7.2
-
Soft Skills:
- Natural Language Processing(NLP)
- Trying to understand Dependency Parsing
- Get trying to more familiar with how to use VS Code and how to debug it for a python file
- Have a basic understanding about the parsed database
- Attend all the teamwork sessions and have a discussion about works.
- Virtual-collaboration: Actively participated in training/Q&A sessions held by colin.
Achievement Highlights:
- Learned how Dependency parsing works and what the foundational knowledge of Neural Transition Parser is.
- Finished reading the Stanford Parser Manual to have a deep understanding of grammatical relationships between words and different format/style for the output.
Tasks Completed:
- Completed Medline database parsing.
- Get familiar with Big data analysis
- Parsed databse using standord parser in java
Goals for The Upcoming Week:
- Combine the output from the Pubmed parser to the Stanford parser and embed it with EBC.
Module-2 Overview:
-
Technical skills:
- Build the sparse Dependency matrix using Stanford Parser
- Used spaCy, a common NLP library in dependency parsing.
- Learned more about Sparse Matrics in Machine Learning algorithm and how it can be used in Dependency parsing.
- Understand how Stanford NLP works and how this can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech, and morphological features, and to give a syntactic structure dependency parse, which is designed to be parallel among more than 70 languages
-
Soft skills:
- Stanford NLP
- Dependency Parsing
Three Achievement Highlights:
- Collaboration with other teams
- Documenting progress
Goals for The Upcoming Week:
- Extract the research papers which contains abstract and extract the drug-gene pairs with its dependency path
Tasks Done:
- Sparse Matrix
- Dependency parsing
Module-3 Overview:
-
Technical skills:
- Filtered the Medline publications according to which ones contain abstracts using PubMed parser
- Learned to use string matching to extract the sentences that contain drug-gene pairs to be input to
the Stanford parser
- Successfully extracted drug-gene pairs by using drug bank( for drug) and pharmGKB(for the gene)
- Learned how to use the Stanford parser to .extract the dependency paths of the drug-gene pairs.
- Biclustered the dependency matrix using the Ensemble Biclustering Algorithm.
- Successfully constructed a graph using an arbitrary number of data files in Dask
-
Soft skills:
Three Achievement Highlights:
- Collaboration with other teams
- Documenting progress
Goals for The Upcoming Week:
- Compute a final set of clusters and visualize via dendrograms
Tasks Done:
- Data filtering
- String matching
- dependency path extraction
- Biclustered the dependency matrix
- Constructed a graph