Dhruv_Kumar - Machine Learning Pathway

Self-Assessment 6/16

Things learned
Some things that I have learned are enhanced uses of python including iterating over large sets of data, using dictionaries, and pickle files. In addition, I learned about embedding, neural networks, RNNs, how words can be converted into vectors and uses for BERT. Through this process, I learned how to use VSCode, Github, Colab, and python libraries such as beautiful soup. Some soft skills that I learned were in communication and how it is important to keep messages clear and concise.

Highlights:

  1. I was able to download VSCode, download the libraries, and open up a json file that contained the questions that needed to be parsed
  2. I wrote an algorithm that went through thousands of submissions in a way that was more efficient than how it was orignially written
  3. I wrote code that was able to extract the text from a question on a discourse form and save it to a pickle file.

Meetings/Training
I attended the meetings every Monday and Friday to report any issues that I was having as well as learn about new concepts. I watched the recorded video from the industry leaders about Github.

Goals for upcoming week
I am working on creating a recommendation system. Each member of the group is able to put their own creative twist to this recommendation system. I am building a recommendation system that will output 5 questions that are most similar to the question that was asked.

Tasks Done

  • Joined Slack and GitHub for Stem-Away ML Team 5.
  • Made Colab notebooks on Google account for scrapping.
  • I chose the discourse forum that I would like to implement (breadtopia)
  • Wrote python code to create URLs with submissions and extracted the individual submissions into a dictionary. The main outline was written by team leads, but I wrote the function to parse through the submissions.
  • Wrote Python code to extract text from each submission. There was an issue, but I resolved it myself by implementing a sleep function, which got rid of the error I was originally receiving. The main outline was written by team leads.

Self-Assessment 6/23

Things I learned
I learned more about how to use Colab and how to import files. I used Json files and how to make them more readable. I also got exposed to goggle cloud platform and some parts of google data studio

Highlights:

  1. I made a simple recommendation system that returns the top 5 similar questions to the current questions.
  2. I wrote code that extracts the metadata from each of the submissions into a list that can be analyzed.
  3. I used GitHub and made a branch with my code and a pickle file for the output.

Meetings/Training
I attended the group meetings. When I was not able to attend a meeting, I made sure to follow up with my team lead(Eren) so that I could get a good idea of what was discussed during the meeting.

Goals for upcoming week
I am working with my subgroup to visualize the metadata that was extracted from the submissions on the New Zeland Bank forum. To do this, we will use google data studio and make a report that could be given to the owner of the website (New Zeland Bank).

Tasks Done

  • Joined new GitHub for ML Team 5
  • wrote code for extracting the metadata from the submissions in python and Colab. Some of the code was written by the mentors.
  • generated a pickle file, which contains the metadata for the submissions
  • created a new branch on the GitHub repository and uploaded my code and pickle file

Self Assessment 6/30

Things I learned
I learned how to use Google data studio in a more in-depth manner and how to combine different sources of information. I also learned how to do more technical writing so that ideas can be clearly communicated with industry leaders. I also learned about the different types of machine learning and how certain ML algorithms work and their strengths/ weaknesses.

Highlights

  1. Completed a couple of figures to show the performance of the Bank of New Zealand forum over time
  2. Wrote a couple of paragraphs with an analysis of the forum and what could be done to increase the user engagement with the forum
  3. Worked on the classification of posts and implemented Support Vector Machines (SVM) ML algorithm to classify posts.

Meetings/ Training

I attended the meetings on Monday and Friday to discuss topics with all team members. I attended multiple other meetings with my subgroup on Friday evening, Sunday, Monday evening, and Wednesday.

Goals for upcoming week

I am working with my subgroup to create a classification algorithm that would give an inputted post a category in the forum. This requires the algorithm to be trained on a set of training data from the forum and then checked for accuracy.

Tasks Done

  • Made 5 figures of performance of Bank of New Zealand forum with various metrics such as views, number of posts, likes, etc
  • Wrote analysis report about how the Bank could improve engagement with its users
  • Implemented SVM machine learning algorithm and used training data from forum for fitting. I received some help from my mentors.

Self Assessment 7/3

My overall experience at STEM-Away has been great for the last couple of weeks and I have learned so much about Machine Learning as well as how to work with industry leaders and college students. I gained exposure and hands-on experience with Machine Learning algorithms and apply them to real-world problems.

Things I learned:
I learned about PCA and RandomForest Classifier Machine Learning algorithms. I learned the differences between the encoding of BERT and TFIDF. I also learned about how to separate data in training and testing data ( undersampling and oversampling).

Highlights:

  1. Implemented PCA and RandomForsetClassifier Machine learning on onehack forum
  2. made plots to analyze the performance of the PCA technique with different dimension reductions and depths for the RandomForestClassifer.

Meetings/Training:
I attended meetings with the group and privately with subgroup members as needed.

Goals for upcoming week:
My subgroup and I are creating a Google slides presentation about our work thus far so that we can present it to the rest of our group. We are also trying to think of ways that we can combine the ideas from our three subgroups into one product.

Tasks Done:

  • Implemented PCA and RandomForestClassifier
  • Made figure for PCA with different dimension reductions
  • Found most optimized accuracy for PCA with the onehack forum and got an accuracy of 73%

Self Assessment 7/17 - 3 week extension

I have completed the normal 5 week program and am currently in the 3 week extension of the ML pathway.
I am working on the presentation with the team members who have decided to stay for the extension.
The part of the presentation that I am responsible for is the performance of the Bank of New Zealand Forum. I summarized our analysis into a couple of bullet points to make it easier to understand.
I also included figures/graphs that we made from Google Data Studio.
I presented my portion of the presentation and explained why we chose to do some of our projects (such as automatically finding which category a post should go in).