We started with exploring data from 4 sources:
We concentrated on the first two since the amount of time needed to acquire API keys from Indeed and Glassdoor did not work for our project timeline. We found out that CareerOneStop works directly off BLS. We decided to start with BLS to have full flexibility. And use CareerOneStop as a backup if the BLS API scripting turned out to be too complex.
API scripting to acquire data from BLS is not easy. It was extremely challenging to figure out what was needed from the documentation on the BLS website - BLS API documentation.The first breakthrough we had was the discovery of this article: Extracting programmer employment data from BLS. This article does an excellent job of explaining the required steps and is a must read for anyone trying to extract data from BLS.
The starting point for data extraction is the BLS Time Series Overview File. Some of the BLS Time Series directories such as the occupation employment directory can be understood due to an excellent Readme. However, directories such as the ln directory have no readme files and are much more difficult to follow. Further challenges faced were due to:
- Unequal lengths of apicodes
- Different fields for apicodes of the same length
Details of the steps executed:
After I gained an understanding of BLS categories, I selected the following categories as the categories of interest for CST.
Dataset from BLS.pdf (153.9 KB)
Next, I created the URLs for the JSON responses. We are not sharing the file publicly since it has our API key.