Besart - Machine Learning (Level 1) Pathway

Technical Area:

  • Acquired an overview of machine learning fundamentals through STEM-Away content material.
  • Learned about the different measures of similarity, including cosine, dot product, and Euclidean distance.
  • Gleaned the linguistic difficulties inherent to NLP systems such as metaphorical language, multiple subjects, and slang.
  • Learned about Root Mean Squared Error as a means of evaluating network test results.
  • Understood the pitfalls of vanilla neural networks applied to NLP tasks.
  • Began the process of scraping data from Discourse community forums using BeautifulSoup and Selenium.

Tools:

  • Git
  • GitHub
  • Python
  • PyCharm
  • BeautifulSoup
  • Selenium

Soft Skills:

  • Initiated meeting icebreaker as a means of facilitating team communication.
  • Established communication channels between co-leads.
  • Set up daily scrum meeting availability with co-leads.
  • Addressed questions and concerns in a timely manner through Discord.
  • Built a rapport with co-leads.

Achievement Highlights:

  • Managed to scrape multiple forums ahead of the next module as a means of practicing necessary tools.
  • Gained a stronger understanding of machine learning fundamentals via supplementary third party tutorials.
  • Fostered an environment of communication within meetings.

Tasks Completed:

  • Established a team GitHub repository.
  • Watched the STEM-Casts pertaining to machine learning and NLP processes.
  • Attended project management meetings, lead meetings, and team meetings.

Goals:

  • Better incentivize meeting attendance.
  • Gain familiarity with data analysis libraries and methods.
  • Delve more into NLP content to derive a thorough understanding.


1 Like

Technical Area:

  • Learned how to utilize Selenium and BeautifulSoup in tandem to scrape a site.
  • Read up on methods of data visualization such as a word cloud and bigrams.
  • Explored several methods of data analysis, including frequency and bag-of-words.

Tools:

  • Pycharm
  • Git
  • GitHub
  • Ubuntu
  • Selenium
  • BeautifulSoup
  • Pandas
  • NumPy

Soft Skills:

  • Engaged with teammates during icebreakers by engaging with follow-up discussions.
  • Attentively engaged with co-lead about sub-team structure.
  • Periodically asked if anyone needed clarification during the meeting.

Achievement Highlights:

  • Provided a tutorial on BeautifulSoup, Selenium, and GitHub during the weekly meeting.
  • Met with co-leads to formulate team suggestions.
  • Resolved permissions issues with GitHub.

Tasks Completed:

  • Completed scraping team selected forum.
  • Allocated sub-team with co-leads.
  • Invited team to repository.

Goals:

  • Finish visualizations of the scraped data.
  • Get ahead on learning BERT fundamentals.
  • Trace sub-team progress.


Technical Area:

  • Learned to navigate various modes of data exploration via a bevy of python libraries.
  • Grew accustomed to Pandas DataFrame objects and tinkered with extra features.
  • Produced bigrams, trigrams, and word clouds as a means of visualizing the data.
  • Explored word-embeddings options in an effort to evaluate their pros and cons.

Tools:

  • Pycharm
  • Git
  • GitHub
  • Ubuntu
  • Selenium
  • BeautifulSoup
  • Pandas
  • NumPy
  • Seaborn
  • TextBlob
  • WordCloud
  • nltk
  • matplotlib

Soft Skills:

  • Facilitated the icebreaker during the meeting.
  • Proactively met and discussed with leads via Discord and Google Meet.
  • Engaged with direct messages and questions asked on Discord from member interns.

Achievement Highlights:

  • Pushed the scraper, eda, csv file, and visualizations to GitHub.
  • Met with co-leads to re-formulate team dynamic and approach to work allocation.
  • Explored extra machine learning resources as a means of discerning effective strategies for recommender system training.

Tasks Completed:

  • Successfully completed module deliverables.
  • Added additional functionality to scrape class to give the user the option to scrape the entire forum or up to a maximum limit.
  • Updated GitHub README file to function as a repository guide and reminder of common Git commands.
  • Attended and helped facilitate scrum meetings.

Goals:

  • Determine which form of word embeddings best suits the scraped data.
  • Practice modeling and training fundamentals on scraped data.
  • Prepare progress deck alongside the rest of the team.


Technical Area:

  • Digested content pertaining to recommender system concepts.
  • Split data in training and test sets and attempted to train a model based on said data using a simple transformer.
  • Navigated Colab and Jupyter Notebooks in an effort to discern pros and cons of both environments for the project at hand.

Tools:

  • Pycharm
  • Colab
  • Git
  • GitHub
  • Ubuntu
  • Pandas
  • NumPy
  • Transformers
  • SkLearn
  • PyTorch
  • BERT

Soft Skills:

  • Facilitated the icebreaker during the meeting.
  • Proactively met and discussed with leads via Discord and Google Meet.
  • Maintained communication on Discord.

Achievement Highlights:

  • Delved into materials in an effort to clarify machine learning concepts as issues arose.
  • Established additional meetings with leads to readjust project problem statement and pipeline goals.
  • Attended session 1 presentation to gain insights on potential future goals and obstacles.

Tasks Completed:

  • Calculated cosine similarity between posts.
  • Prepared a bag-of-words and tf-idf word embeddings ahead of training.
  • Attended and helped facilitate scrum meetings.

Goals:

  • Determine issue causing Colab and Jupyter to terminate upon attempting to training the model.
  • Utilize cosine similarity as basis to train a simple recommender.
  • Train additional classifiers.


Technical Area:

  • Trained a plethora of models in an effort to gauge the best performance including: naive-bayes, random forest, decision tree, logistic regression, and linearSVC.
  • Switched between various word embedding methods to improve accuracy scoring.
  • Learned about various forms of hyperparameter tuning and applied some minor changes to the models.
  • Artificially rebalanced select categories.

Tools:

  • Pycharm
  • Colab
  • Git
  • GitHub
  • Ubuntu
  • Pandas
  • NumPy
  • Transformers
  • SkLearn
  • PyTorch
  • Jupyter Notebooks
  • Seaborn
  • matplotlib
  • scikit-learn

Soft Skills:

  • Assisted members directly via personal messaging communication channels on an as needed basis.
  • Held multiple meetings with leads to determine course of action pertaining to team communication efforts.
  • Maintained communication on Discord and through scrum meetings.

Achievement Highlights:

  • Produced a problem statement befitting the fluid nature of the selected forum to scrape alongside co-leads.
  • Established sub-teams to divide focus on improvements regarding the recommender system and the classifier training efforts, alongside co-leads.
  • Made improvements to the data cleaning process, resulting in a team-wide CSV file that served to unify and streamline training efforts.
  • Grew accustom to BERT model structure and weighed accuracy against scikit-learn algorithms.

Tasks Completed:

  • Developed a rudimentary recommender system.
  • Trained a wide range of models and tinkered with parameters to induce accuracy improvements.
  • Created and tweaked confusion matrix heatmaps.
  • Pushed team-wide CSV to GitHub.
  • Attended and helped facilitate scrum meetings.

Goals:

  • Gauge team progress.
  • Continue to modify parameters in an effort to improve accuracy scores.
  • Identify whether accuracy scores cohere with confusion matrix output and adjust accordingly.
  • Potentially add more artificial data to assist with certain categories if accuracy stagnates.


Technical Area:

  • Performed hyperparameter tuning on trained models.
  • Completed recommender.

Tools:

  • Pycharm
  • Colab
  • Git
  • GitHub
  • Ubuntu
  • Pandas
  • NumPy
  • Transformers
  • SkLearn
  • PyTorch
  • Jupyter Notebooks
  • Seaborn
  • matplotlib
  • scikit-learn

Soft Skills:

  • Assisted members directly via personal messaging communication channels on an as needed basis.
  • Held multiple meetings with leads to determine course of action pertaining to team communication efforts.
  • Maintained communication on Discord and through scrum meetings.

Achievement Highlights:

  • Trained and tuned Bert and Roberta classification models.
  • Achieved marked improvements in accuracy relative to former attempted models.

Tasks Completed:

  • Adjusted recommender system using a suitable CSV.
  • Trained several advanced classification models.
  • Produced confusion matrices for each model.
  • Attended and helped facilitate scrum meetings.

Goals:

  • Continue to modify parameters in an effort to improve accuracy scores.
  • Tinker with Flask app.
  • Complete final presentation.