In part 1 of this article, I mentioned that this job system needed three things:

  • The user’s data request
  • A way to convert that request in to a SQL query
  • A way to convert that SQL query in to a downloadable file

Structure

Structurally, this works out to several linked dependencies:

The site frontend is responsible for generating the job requests, but before it does that it needs to check to see if we have a cached version available. Due to the dynamic nature of the export data, we use a filename suffixed with a hash; several dates and pieces of metadata from the underlying info make up that hash. Code calculates the filename, if that filename exists in our content storage network we just return that immediately.

Both the filename calculation and the export code itself use the generated SQL statement. Whether the user’s request is simple or complex, a service converts that request in to a SQL statement. An entire article series could be devoted to that particular service alone.

The export job service needs both the calculated file name and the SQL statement in order to properly export the data and store it. It runs out in the cluster alone, so it needs to be able to pick that information up on runtime.

Putting it together

Regular code libraries could work for something like this, but if I needed to change something about, for example, the cache hash generation I’d be stuck updating at least the site frontend and all the job services. It could cause disruptions depending on my own speed and deployment delays. Independently deployed microservices communicating over TCP were the best (and most fun) route here. We needed only two of them:

  1. A consistent filename generating service, depends on #2
  2. A SQL statement generating service

The SQL statement generating service has some additional shared microservice dependencies of its own. The frontend and job service communicate with the two services above to ensure they always stay in sync with each other.

This creates a nice, easy to plot out web of dependencies that are easier to update individually than imported libraries. Additionally, tracing down exactly which module is having a problem is significantly easier when paired with good error monitoring.