16/06/2020: Post Week 2 Assessment
Concise Description of Things Learned -
Technical Areas:
- Web Scraping using BeautifulSoup and Selenium
- Using Python libraries like Pandas, Matplotlib, Scikit-Learn, and other language processing libraries
- Able to use Github and basic git commands
- Conducting basic pre-processing and exploratory analysis using Python
- A few ML models, and basic structure/methodology of ML Models
- Data Manipulation/Processing Techniques
Soft Skills:
- Ability to critically look at data and interpret it/ infer ideas from it
- Learning to learn (That’s continuously happening!)
- Communicate my results better
- Importance of every bit of data, and the necessity of making wise decisions
Meetings Attended:
All meetings except 1 (watched the recorded session for it)
Achievement Highlights:
- After week 2, I have a profound understanding of web scraping. This was something I had done previously, but I never knew there was so much to learn even now. The challenges we faced while scraping Flowster-forums paved the way for a lot more improvement to my understanding.
- More informed about how data should be processed, manipulated and analysed in order to get the best possible results - every piece of data counts and nothing can be taken for granted. Understood the importance of data pre-processing, and how if not done properly, can lead to potentially inaccurate ML Models.
- Working on Doc2Vec currently
Goals for the upcoming week:
By the end of Week 3, I aim to be able to understand the various approaches that can be applied to the same data, and how to choose the best model with the max accuracy.
A detailed statement of tasks done:
Week 1:
Successfully scraped Flowster-Forums
Tasks: Using Selenium and BeautifulSoup to scrap Flowster-forums with my team.
Challenges: We faced difficulties like not being able to scrap the entire data at once because of the scrolling abilities of bs4. Another issue was extracting the totally embedded data like the number of views which was embedded a lot deeper in the HTML tags tree.
Solution: The solution came by having done an extensive google search and taking help from the mentor and team lead.
Week 2:
Exploratory Data Analysis and Data Cleaning
Tasks: Using basic python libraries to clean up the data, and explore it, infer correlations.
Difficulties: I tried to think of pre-processing the data, but it seemed like I had my thoughts limited to what was discussed in the meeting. I googled it to get more advanced ways, but somehow it turns out that everything depends on the kind of data we have.
Solution: I am still figuring out a way continuously to clean my data in a better way. Later in week 3, I will be working on Doc2Vec, and a cleaned and processed data would obviously be a good choice.
Overall, it has been a wonderful experience working with great mentors and enthusiastic team members. Looking for a lot more learning from the remaining half of this journey!
Thanks a lot to Sara and Rohit for being great mentors and leaders!