Project Pathfinding: Initial Definition

Overview

The Career Statistics Tool can be used by users to see information about different careers in a clear manner. This information will mostly be aggregations such as average salary, locations, etc. This tool will be integrated with the core of the platform which is the STEM-Away® Forums.

To enable a conversation between users, each career or industry will receive it’s own forum topic with the top post presenting this information in the form of graphs, maps, and tables which will be updated on a regular basis. Users can then comment on the forum topic enabling a conversation about the career and allowing the anecdotes to be shared.


Specifications from Project Stage 1 (Project Pathfinding)

We will be using Google Cloud for storage and computation. The forum posts are created using an open-source forum code called Discourse.

Initial set of tasks identified

Note: Several tasks were modified based on new findings during the project execution

Phase 1- API to Cloud SQL

The first step will be to use a job API or multiple to create a single relational (SQL) database to store the information. The following will need to be accomplished for this part to be a success:

Phase 2- Use Cloud Dataflow to Move to BigQuery

This step requires using Cloud Dataflow to simply move the data from CloudSQL to BigQuery. This can all be done using the Google Console UI and as such should not be very difficult.

There are also other tools to do this such as Alooma:

Phase 3- Visualization

This stage is for visualizing the data we are storing in the database. This step should produce a script that can be run every night to produce new graphs, tables, and maps.

  • To quickly see the data options you can check out DataStudio. In BigQuery if you click on the database (the one imported from Cloud SQL), there is an option at the top right of the details section to click on the proceed to DataStudio. You can use DataStudio to figure out which graphs, tables, and maps will be the best to show both on the landing page and on each of the discourse pages.
  • For the long run, we need a script. Therefore, we can instantiate a connection to BigQuery in python allowing us to query the BigQuery database for results and obtain them as a pandas dataframe which is great for visualization in python.

Phase 4- Save Images on Cloud Storage

Phase 5- Front-end

Here we want to neatly display the images (graphs, tables, and maps) on the landing page and the forum pages. Some specific features:

  • We want the landing page to change around 3 times each day → Rotate the images of the day.
  • Have a specific starting set of employments (or fields depending on how we decide to divide the forum pages) which have forum pages immediately
  • If a user searches for one that does not exist have the ability to create one easily. Shouldn’t be too hard if we have a generalized method for the creation of forum pages (i.e. same sorts of graphs and layout with the only real change in querying the database is the name of the field/employment).

Untitled design (10)