St3939 - Machine Learning Pathway

st3939 · September 20, 2020, 10:13pm

Saad Tariq Self Assessment

st3939 · September 20, 2020, 10:13pm

Project / Product Management areas covered

Product Technical Design Development
Agile Framework(Scrum)
Stakeholder Management
Product Requirement Gathering
Task Management and delegation
Parallel work with other engineering teams

Technical Areas Involved In/ Covered:

Web driving
Web scraping
Data preprocessing
Natural Language Processing Algorithms
- TF-IDF
- BERT
Machine Learning; Multi-Label Classification Models
- Feed-Forward Neural Network
- Logistic Regression
NLP annotation tool characteristics
Batch based active learning architecture
Tag-recommendation methods

Leadership skills :

Team building: Team building was a focus of my project plan since day 1. Our team started our meetings by small informal talk about how everyone was doing and how were their weeks. This informal interaction allowed us to get to know each other better, know what we were passionate about and how we function and think. We also conducted a few exercises with outcomes that allowed team mates to see similarities amongst themselves. One of the techniques I used was to show how passionate I was for everyone’s personal and professional development as well by guiding them through how exactly this project will be a benefit for everyones future. This resulted in immediately better commitment to the project and the team!
Knowing the Stakeholders: A lot of the time was spent developing analysis around the expectation of the stakeholders i-e owner of the forums, project track leads, mentors and end users of the platform. The first few weeks were spent on getting to know each of the stakeholders better and build better relationships with them. This allowed me to have better communications while gathering and synthesising requirement for the product which I made sure aligned with and exceeded the expectations of all stakeholders!
Flexibility and Durability: Some of the challenges during this session that the team faced were surrounding
* Timezone issues and tight time conflicts amongst team members
* Conflicts in opinions
* The spectrum of educational background which varied by major as well as class year (High school to PhD)

Flexibility and durability had to be built as a value amongst the team members so people were available to work across different time zones and learn sometimes small sacrifices have to be made in terms of sleep and personal leisure time to accommodate time for others and work towards a common cause (completion of a successful project). Conflict of opinions were resolved by activities that showed that the final goal was the same for everyone so everyone aligned themselves to the goal rather than the opinions. Also, a mean standard of work had to be determined and reached for all the participants combined together. This was necessary so the projects tasks still proved a challenge to everyone on the team and provided a learning experience but the individual work delivered was compatible with the work of other team mates.

Team synergy: Team synergy was necessary to the completion of the project within the given time period. To do this, it was pertinent to the cause that everyone understood the exact part they were playing in the bigger scope of the project. This allowed everyone to achieve a sense of contribution while also giving the impression that each individual was the key pin for the project. This meant the participants communicated more and were actively engaged with the final goal leading to team synergy and higher motivation
Strategic thinking & Decision Making: Everyone’s strengths had to be assessed so sub-teams could be made for the division of labor amongst the participants to stay ahead of the project timeline. Also critical decision making had to be done specifically in terms of which one of the 3 models tried and implemented by the team would be best suited, what sort of tag recommendation method should be used and how to set up the active learning loop within the product architecture keeping the requirements of the final product and implementation costs in mind.
Presentation Skills: I had to ramp up my presentation skills significantly during the course of the project as I was communicating and presenting developments with the stake holders on a regular basis. I had to learn how to present technical stuff to business audiences and convince them of the success of one’s product. This improvement was coming as a by-product of our meeting with our mentor who regularly guided us on the best practices for presentations to larger non-technical audiences

Soft Skills Used:

Team Collaboration
Digital Communication
Remote work
Online teaching (using iPad + Pencil + Screen sharing to make virtual white boards)
Brainstorming in large groups and synthesise ideas

Achievements:

Successfully conducted 12 team meetings and at least 2 one-one meeting with every participant
Designed 3 models which exceeded model accuracy requirement of >70%
Developed the full product framework combining tag recommender model, tag annotator and an active learning loop
Built a product with basic functionality for benchmarking with Stack Exchange and STEM-Away data
Built a product which replicated tagging behaviour of Stack Exchange with the built in ability to increasingly replicate STEM-Away behaviour over time as data becomes more populated
Utilized the behavior of tagging in Stack Exchange forum to predict and annotate tags for un-labelled STEM-Away data
Observed an increased accuracy, precision and recall for product tagging predictions after implementing active learning with tag annotator.
Presented final product to all Stakeholders and exceeded expectations.

Tools Used:

Python: Numpy, NLTK, Gensim, Scikit Learn, Pandas, Beautiful Soup, Selenium, Scrapy, transformers, sentence transformers, stop words, porter stemmer
Trello
VS Code
Git
Slack
Google Colab
Notion, used to post online resources
Google Suite

Goals for future work

Ability to integrate user-fed tags and expand tag vocabulary beyond the 25 tag set that we chose.
Analyze the model results on special cases, e.g., if same post is assigned different tags by two users during tag annotation, what will be the effect on the model in that case.
Integrating team one’s approach so a decision can be made between high confidence tagging and low confidence tagging from user fed data.