Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills
- Technical area:
1. Definition of machine learning and natural language processing
2. A basic understanding of project workflow
3. Some Machine Learning Algorithms(Naive Bayes, Linear SVM, Logistic Regression)
4. Able to perform web scraping with BeautifulSoup and Selenium
5. Able to perform data cleaning and EDA
1. Basic usage of Github(commands such as push, pull, rebase, etc)
2. BeautifulSoup and Selenium library
3. Became familiar with csv file and data manipulation with pandas
4. Medium is a good source for technical hacking articles
- Soft skills;
1. Better at searching for help on the internet
2. Improved communication and collaboration skills
Three achievement highlights
- Successfully scraped Flowster Discussion Forum by collaborating with teammates
- Started a blog about the project on my new Medium account
- Successfully implemented Linear SVM
List of meetings/ training attended including social team events
- Everyone of them
Goals for the upcoming week.
- Be able to understand and use Linear SVM to achieve an accuracy above 75%
- Try to understand the other basic models as well
- Learn complicated models such as BERT
- Understand the code I implement
Detailed statement of tasks done. State each task, hurdles faced if any and how you solved the hurdle. You need to clearly mark whether the hurdles were solved with the help of training webinars, some help from project leads or significant help from project leads.
- We web scraped desired information from the discussion forum such as the post, comments, likes, views so that we can eventually classify the posts into correct categories or even build a recommendation system. The main problem we encountered is that the BeautifulSoup library is nor able to scrape all information we need because it requires manually scrolling through the pages for extra information to load. Therefore, we used BeautifulSoup in conjunction with Selenium to solve this problem. We found this solution mostly by doing web research ourselves and collaborating among team members.
- Now we are trying out various kinds of basic models for post classfication. The struggle we have is the need to better clean our data. We are approaching this issue through trial and error and a bit guidance from our team lead(such as avoid deleting of words).