Xianbo_Gao - Machine Learning Pathway

My first self-assessment on 16.06.2020

What I Learned:
1. Technical area

  • Learn how to do web-scraping. Since I’ve never done web-scraping before, I need to learn it first and help people with different packages they are using.

2. Tools

  • How to use Zoom, StemAway platform and Asana

3. Soft-skills

  • Leadership sills: how to organize and split tasks, communicate with people, give a meeting and make plans of what we should achieve in each week.

Three achievement highlights:

  • In coding, write a notebook of web crawling example and a notebook contains basic data analysis codes for beginners for tutorial purpose. Combine the codes of our group and generate the clean data for next step.
  • Teach people and help them with technical problems
  • Set up working environments including GitHub and slack.

Meetings/training sections:

  • Weekly Monday team meetings, team lead meetings
  • Weekly lead meetings (ML pathway)
  • Weekly other pathway meetings (UX, FS leads)
  • Webinars: ML, Asana, industry mentors for text-classification & Git

Tasks I did:

  • In coding, write a notebook of web crawling example and a notebook contains basic data analysis codes for beginners for tutorial purpose. Give a brief tutorial on how to do web-scraping and how to use GitHub. Write my own code of web scraping as well as combine all the participates’ codes into one notebook, choose best performance codes for the same task, try to keep everyone’s idea of data preprocess, fix the bugs, clean the codes and generate the data to a csv file including columns contains original data and pre-processed data with the combined codes.
  • Set up working environments including GitHub and slack
  • Give instructions in how to do tasks, and help people with technical problems such as coding, what package and method to use, GitHub and software installation.
  • Discuss with other leads and make plan each week, give a meeting and communicate with group members
  • Holding office hours 3-4 times per week to answer questions.

Goal for next week:

  • Get a brief text analysis result with our team.

My first self-assessment on 16.06.2020

What I Learned:
1. Technical area

  • Learn how to use BERT.

2. Tools

  • Already learnt in first self-assessment

3. Soft-skills

  • Leadership sills: how to organize and split tasks, communicate with people, give a meeting and make plans of what we should achieve in each week.

Three achievement highlights:

  • In coding, write a notebook of BERT example

  • Teach people and help them with technical problems, check their codes and give advice of how to improve and what’s the next step

  • give a lecture on BERT and make sure the attendance understand how to use BERT
    Meetings/training sections:

  • Weekly Monday team meetings, team lead meetings

  • Weekly lead meetings (ML pathway)

  • Weekly other pathway meetings (UX, FS leads)

  • Webinars: webinars for text-classification

Tasks I did:

  • In coding, write a notebook of BERT example and uploaded to GitHub. Give a brief lecture on how to do BERT. Write my own code of BERT. Check everyone’s code and working on combining different methods together.
  • Give instructions in how to do tasks, and help people with technical problems such as coding, what ML tools to use, how to improve the model.
  • Discuss with other leads and make plan each week, give a meeting and communicate with group members
  • Holding office hours 3-4 times per week to answer questions.

Goal for next week:

  • Get a final text analysis result with our team with a good accuracy.

My final assessment

What I Learned:
1. Technical area

  • Understand BERT and recommendation system better

2. Tools

  • Learnt how to use Collab

3. Soft-skills

  • Leadership sills: how to organize and split tasks, communicate with people, give a meeting and negotiate with people to make decisions

Three achievement highlights:

  • Update the dataset with more information including content and stem in string format and increasing the size to 5000 samples
  • Reviewing others’ results and find out how to improve the accuracy such as using titles as input which work.
  • Prepare and give the presentation

Meetings/training sections:

  • Weekly Monday team meetings, team lead meetings
  • Weekly lead meetings (ML pathway)
  • Training from Colin

Tasks I did:

  • I update the dataset with more information including content and stem in string format and increasing the size to 5000 samples. I also tried to combine tf-idf with BERT and used tf-idf for classification but the accuracy was worse than using BERT only. The result was highly biased and nearly all the prediction are the same. I go through others’ BERT codes and help them with coding. Of all the codes Bardon’s codes performs the best. I focused on his code and did some tests and I gave him 2 advices which helped him increase the accuracy to more than 60%.
  • Holding office hours 3-4 times per week to answer questions. Before the presentation week I also held extra office hours to help with BERT implementation.