Arturo_Alvarado - Machine Learning Pathway

Overview of things learned:

Technical Area:

  • Various NLP models including BERT and bag of words
  • Learned how to scrape the web using BeautifulSoup and Selenium
  • Gained experience in inspecting HTML elements in different discourse forums

Tools:

  • BeautifulSoup, Selenium, BERT

Soft Skills:

  • Remote Collaboration
  • Setting independent goals

Three Achievements:

  1. Understanding how we can use BeautifulSoup to webscrape. It allowed us to create soup objects in which we can extract only the information we desire from our discourse forum.
  2. Fixed the issue where I was only getting a limited number of elements after BeautifulSoup. This problem was because I wasn’t loading all the HTML elements. Selenium allowed me to scrape a large number of data points.
  3. A minor achievements was getting python up and running on my computer. I was getting compilation errors for simple commands. Through the command line I was able to update the packages used allowing me to continue with my tasks.

Meetings/Training Attended

I have been able to attend every team meeting. I have also attended the useful webinars provided some of the other team leaders.

Goals for Upcoming Week

Finish my BERT implementation. Also attempt to implement a different model in order to compare accuracy between the two.

Tasks Done

  • Watched each of the training webinars to better prepare myself for the project and the technical requirements required to succeed.
  • Our task for the first week was to select a forum and scrape key information from it. Our team selected the Atom Discourse Forum as our subject of data collection. I was able to use BeautifulSoup and Selenium to collect a large quantity of data from the Atom Discourse Forum and separate it into a CSV file. The resources that Anubhav and Saad posted were really helpful in fully understanding how BeautifulSoup is utilized.
  • Our task for the second was to go through a handful of NLP implementations that would help us increase the accuracy of our recommender system. I’m still working on getting BERT fully operational. The resources that Anubhav has posted have really helped out.

Overview of things learned:

Technical Area:

  • Additional research on algorithms such as BERT and TD-IDF
  • Additional practice using BeautifulSoup and Selenium
  • Implemented a program that can detect similarities between a set of input and different threads.

Tools:

  • Sentence Transformers, nltk

Soft Skills:

  • Setting independent goals
  • Presentation of data

Three Achievements:

  1. I was able to take the data I scraped from the Atom forum implement a BERT model to detect similarity in forum topics.
  2. As a team we were able to collect 14,000 stemming from 4 categories and 76 different subcategories. From this we were able to extract data-topics and topic-contents.
  3. As a team we have implemented an SVM model and random forest.

Meetings/Training Attended

I have been able to attend every team meeting. I have also attended the 3 NLP webinars that walked us through the theory and implementation of BERT. I have also attended every subgroup meeting where we are working on the codecademy forum.

Goals for Upcoming Week

Work on presenting our results we have gathered from the codecademy forum. Enhance our scraped data with additional augmentation techniques.

Tasks Done

  • Finished implementing BERT model as well as TD-IDF model
  • Switched forums from Udactiy to codecademy because of a login issue that was preventing us from successfully scraping data.
  • Take the data we scraped from the codecademy forum and create a CSV file so that our team can continue with our various classifications.

Final Assessment

Technical Area:

  • Web scraping and data mining
  • Extracted contextualized word embeddings
  • Document classification
  • Implementation of Classification models
  • Data Augmentation

Tools:

  • Beautiful Soup
  • Sellenium
  • Natural Language Toolkit
  • BERT Libraries
  • Sentence Transformers

Soft Skills:

  • Gained collaboration experience in a remote environment
  • Accomplished weekly tasks in a timely matter
  • Learned to evaluate my own progress and write a report on it

Three Achievements:

  1. I was able to complete the weekly tasks in the time provided. For our first task of scrapping the Atom forum I needed to ask for some guidance in regards to gathering a larger data set. I was able to read the resources provided and managed to scrape a larger quantity of data than my first attempt.
  2. I was able to apply the BERT model to my first data that I collected. The webinars hosted by Colon really gave me a clear understanding the NLP framework and how we can use its bidirectional characteristics towards our own tasks. I ended up with a very effective language model.
  3. My subgroup and I were able to scrape approximately 14,000 data points from the codecademy forum. We separated our data into categories, sub-categories, topics, and replies. Then we used the distilBERT model to transform the data and obtain its hidden state embeddings.

Meetings/Training Attended

During the 5 week session, I was able to attend all webinars hosted by stemaway as well as all team and subteam meetings.

Tasks Done

  • Gathered 14,000 data points from the codecademy forum
  • Generated a CSV file that was parsed into 4 different elements such as categories, subcategories, topics, and replies
  • Took this data and applied the distilBERT model to transform the data and obtain its hidden state embeddings
  • Implemented classification algorithms such as support vector machines, Random Forest, and Feed Forward Neural Network
  • Implemented various data augmentation techniques to improve the accuracy of our algorithms