Duranilsu - Machine Learning (Level 1) Pathway

Modules 1 and 2

Technical Skills

Learned about web scraping and web crawling, and got familiar with libraries to obtain data. Used Scrapy to build web spiders. Learned to reformat data based on my needs from HTML code.

Tools

Trello, Scrapy, Beautiful Soup, Selenium

Soft Skills

Joined Trello to get myself on track with the tasks and my teammates, prepared a Github repo for further work.

Achievements

Set up my environment for data mining using web scraping and module tutorials (They were super helpful!) and prepared my Github environment with Sara’s repo.

Future Considerations

I would like to get more comfortable completing web scraping tasks by practicing the technical skills I learned during the first two modules.

Module 3:

Technical Skills:

Revisited web scraping skills and observed how to use collected data within different classification models. Loaded data to CSV files for later analysis.

Tools:

Colab, Beautiful Soup, Selenium, Numpy

Soft skills:

Staying in contact with team members and communicating with team leads for changes in schedule.

Highlights:

I was able to load data to csv files. I learned about classification models for later use.

Achievements

Cleaning data, looking into ML terms and developing further understanding of ML.

Further considerations

I would need to work on categorizing my findings to better feed the ML algorithm and the recommender.

Module 4

Technical Skills:

Understood the principles of pretrained models such as BERT. Used the code provided to test its capabilities through simple transformers.

Tools:

Pandas, Numpy, Jupyter Notebook, Selenium

Soft skills:

Watching tutorials to better comprehend classification models, practicing our final presentation!

Highlights:

I successfully built some classification models using logistic regression and decision trees, and advanced these using BERT.

Achievements:

Comparing different methods of classification systems, testing with BERT model. Decreased the amount of data imbalance. Achieved a 85% accuracy at most!

Further considerations:

Use cross validation to improve the accuracy of data collected. Look into Flask to turn this into a web app.