Xianbo_Gao - Machine Learning Pathway

Xianbo_Gao · June 16, 2020, 10:01pm

My first self-assessment on 16.06.2020

What I Learned:
1. Technical area

Learn how to do web-scraping. Since I’ve never done web-scraping before, I need to learn it first and help people with different packages they are using.

2. Tools

3. Soft-skills

Leadership sills: how to organize and split tasks, communicate with people, give a meeting and make plans of what we should achieve in each week.

Three achievement highlights:

In coding, write a notebook of web crawling example and a notebook contains basic data analysis codes for beginners for tutorial purpose. Combine the codes of our group and generate the clean data for next step.
Teach people and help them with technical problems
Set up working environments including GitHub and slack.

Meetings/training sections:

Tasks I did:

In coding, write a notebook of web crawling example and a notebook contains basic data analysis codes for beginners for tutorial purpose. Give a brief tutorial on how to do web-scraping and how to use GitHub. Write my own code of web scraping as well as combine all the participates’ codes into one notebook, choose best performance codes for the same task, try to keep everyone’s idea of data preprocess, fix the bugs, clean the codes and generate the data to a csv file including columns contains original data and pre-processed data with the combined codes.
Set up working environments including GitHub and slack
Give instructions in how to do tasks, and help people with technical problems such as coding, what package and method to use, GitHub and software installation.
Discuss with other leads and make plan each week, give a meeting and communicate with group members
Holding office hours 3-4 times per week to answer questions.

Goal for next week:

Xianbo_Gao · June 28, 2020, 6:56pm

My first self-assessment on 16.06.2020

What I Learned:
1. Technical area

2. Tools

3. Soft-skills

Leadership sills: how to organize and split tasks, communicate with people, give a meeting and make plans of what we should achieve in each week.

Three achievement highlights:

In coding, write a notebook of BERT example
Teach people and help them with technical problems, check their codes and give advice of how to improve and what’s the next step
give a lecture on BERT and make sure the attendance understand how to use BERT
Meetings/training sections:
Weekly Monday team meetings, team lead meetings
Weekly lead meetings (ML pathway)
Weekly other pathway meetings (UX, FS leads)
Webinars: webinars for text-classification

Tasks I did:

In coding, write a notebook of BERT example and uploaded to GitHub. Give a brief lecture on how to do BERT. Write my own code of BERT. Check everyone’s code and working on combining different methods together.
Give instructions in how to do tasks, and help people with technical problems such as coding, what ML tools to use, how to improve the model.
Discuss with other leads and make plan each week, give a meeting and communicate with group members
Holding office hours 3-4 times per week to answer questions.

Goal for next week:

Xianbo_Gao · July 26, 2020, 9:39am

My final assessment

What I Learned:
1. Technical area

2. Tools

3. Soft-skills

Leadership sills: how to organize and split tasks, communicate with people, give a meeting and negotiate with people to make decisions

Three achievement highlights:

Update the dataset with more information including content and stem in string format and increasing the size to 5000 samples
Reviewing others’ results and find out how to improve the accuracy such as using titles as input which work.
Prepare and give the presentation

Meetings/training sections:

Tasks I did:

I update the dataset with more information including content and stem in string format and increasing the size to 5000 samples. I also tried to combine tf-idf with BERT and used tf-idf for classification but the accuracy was worse than using BERT only. The result was highly biased and nearly all the prediction are the same. I go through others’ BERT codes and help them with coding. Of all the codes Bardon’s codes performs the best. I focused on his code and did some tests and I gave him 2 advices which helped him increase the accuracy to more than 60%.
Holding office hours 3-4 times per week to answer questions. Before the presentation week I also held extra office hours to help with BERT implementation.