- Project Planning and task delegation and planning for the overall project workflow for the 5 weeks.
- Expertise on web scraping and data mining the discussion forums using libraries beautiful soup and selenium.
- Performed data cleaning and EDA on mined data for both forums.
- Helped team merge CSV files of two forums for final deep learning Bert classification.
- Implemented the use of neural networks for classification achieving an accuracy of 78% by the end.
- Overall planned task delegation and provided technical support for programming, web scraping and machine learning areas.
- Lead the data mining through web scraping and later worked on feature engineering from the mined data.
- Ran the TF-IDF text embedding with logistic regression published results on the Github.
- Provided resources for data mining.
- Jupyter notebook
- Google Colab
- Helped organize the hectic schedule of project goals broke them down into manageable tasks for the team and learned to work and learn in the process.
- Learned how to delegate tasks and optimize team’s resources.
- Learned how to perform EDA and data cleaning before the machine learning process.
- Learned the process of the BERT neural architecture read papers and understood the theoretical foundations of the BERT neural architecture.
- Successfully learned how to perform text embedding along with machine learning models.
- Learning deep learning frameworks and applied them to this project.
List of meetings:
Attended all meetings first 3 weeks but missed a few 4th weeks attended almost none fifth week due to my hospitalization.
Couldn’t help with the presentation due to my sickness.
Detailed statement of tasks:
Initiated project goals and planned out a schedule for the next 5 weeks including dates, task assignments, and other goals for the internship.
Set deadlines on Asana and gave the team a webinar on the project process for data mining and storage of data.
Helped team scrape forums and gave them resources for web scraping.
Week 2 and 3:
Assigned machine learning models and text embedding to teams and note down all key points from presentations and summarized them for team understanding on models to be used for the future of weeks 4 and 5.
Laid foundation for all data cleaning and preparation to classify the project.
Worked on the TF-IDF model along with logistic regression improving accuracy overall.
Worked on the final presentation, however, couldn’t present or work further in week 5 due to hospitalization.