Uzoman-22 - Machine Learning Pathway

Tues 6/16 Assessment
Technical Area Learned: Learned how to web scrape in Python, learned how to use Python on command line/terminal, learned how to use Juypter Notebook
Tools: Specific tools used were BeautifulSoup package, also learned how to use web developer tools on web browsers
Soft Skills: Communication, maintaining deadlines, general team building

The first achievement I had was learning how to understand web scraping. The second achievement was trying to get the contents from web scraping into a pandas DataFrame. Third achievement is succesfully doing a Juypter Notebook, as I had never really done it before.

Goals for me are to update my dataframe to include contents inside a post, as right now my dataframe only contains topics, tags and url. I would like to do some data cleaning and maybe EDA if time allows, and then learn about NLP and apply it to the dataset.

The main task I did was web scraping. I first had to watch and read lot to understand the general process of web scraping. There were many places to go to learn, and each had different styles, but I eventually settled on mainly using the Python requests package and Beautiful Soup. I was able to get the topic, url, and tags from the website my team chose to scrape, but as I stated, I would like to try and get more early on in the week. I then created a pandas DataFrame out of the columns.

Meetings attended:
6/8 Weekly Meeting
6/15 Weekly Meeting
I also attended many preliminary meetings before teams were assigned

Fri 6/26 Assessment
Technical Area Learned: Learned more about web scraping
Tools: Specific tools used were BeautifulSoup package
Soft Skills: Working from behind a deadline

Main achievement was to finish web scraping.

Goals for me are to finish data cleaning and explore text classification

The main task I did was to try and get the text from each post. I at first tried to separate the replies and the main post into different columns, but I could not do this so I had to combine all the text into one column. I don’t know if this will backfire because I’m not completely sure of the text classification process
Meetings attended:
6/22 Weekly Meeting
6/29 Weekly Meeting

Final Assessment
Technical Area Learned: Clustering techniques and regex
Tools: Specific tools used was sklearn
Soft skills used: Reading and understanding other people’s code

My main achievement was to try to learn the recommendation part, as I didn’t have enough time to get through text classification. There are many methods to recommendation such as K nearest neighbors, neural nets, K means, principal component analysis, and hierarchical clustering.

Meetings attended:
6/30 Meeting