Written on Sunday 28th June 2020
A concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills
- Learned web scraping through DataCamp Course
- Improved my collaboration Git skills
- Revised and updated my knowledge on Transformers, ELMO, OpenAI and BERT neural networks architectures.
- Got a better understanding of how :
- I can use BERT embeddings to train another model
- BERT performs a multi-class classification
- Grasped the difference between Multi-class classification and Multi-label classification
- Tested data augmentation techniques in NLP
- Advanced usage of Github (such as issuing a Pull request, reviewing it, resolving conflicts and closing a PR)
- Performed web scraping by using the Scrapy library
- Became comfortable with using Git commands to collaborate in a teamwork setting
- Learned how to better use VS Code extensions to visualize how I interact with Git commands
- Improved how I dealt with project management issues
- Upgraded my communication and collaboration skills
- Sharpened my mindset to become a high-level goal-oriented mindset instead of solely focusing on the details of the solution
Three achievement highlights
- Learned how to perform web scraping using Scrapy
- Improved my Git collaboration skills
- Grasped the details behind how BERT worked and its architecture
- Successfully implemented a text augmentation technique and improved the accuracy thanks to it
- Handled in a diplomatic approach project management issues
List of meetings/training attended including social team events
- Everyone of them
- I also attended all of Colin’s meetings and missed the last 3 webinars
Goals for the upcoming week:
- Training BERT as a classifier on the merged data
- Improve BERT’s accuracy through fine-tuning
- Gather my team’s work and prepare a presentation to showcase our results
A detailed statement of tasks done.
State each task, hurdles faced if any, and how you solved the hurdle. You need to clearly mark whether the hurdles were solved with the help of training webinars, some help from project leads, or significant help from project leads.
- All of the hurdles I faced came from git commands. For example, when I was using the mentorchains GitHub account I had to add it to the internal configuration of my laptop that took me a lot of time to resolve because I had trouble adding another id_rsa and getting its public key from my laptop. I resolved this by combining the explanation of a freecodecamp blog post and the official documentation of GitHub.
I faced other issues when I was preparing the GitHub tutorial for my team and resolved them by scrolling through StackOverflow community answers.
- One of the challenges we faced with one of the forum’s (the Flowster forum) was the lack of data I did a quick google search about different data augmentation and preprocessing techniques that should be performed in this case and I also asked some NLP experienced friends on how they usually faced this challenge and shared their insights with my team.
- Another challenge that we faced this week (week 4) is that the BERT model doesn’t train on Colab because the length of our sequences goes beyond the one predefined (default=512, ours= ~3000) also the batch size is too high for the RAM provided by Colab. I suggested that we use the default sequence length (512) and train on a TPU as usually TPUs can handle more data and are faster than GPUs.
I will be working on finding the best batch size and sequence length this upcoming week.