30 September, 2015

Taskcluster Component Loader

Taskcluster is the new platform for building Automation at Mozilla.  One of the coolest design decisions is that it's composed of a bunch of limited scope, interchangeable services that have well defined and enforced apis.  Examples of services are the Queue, Scheduler, Provisioner and Index.  In practice, the server-side components roughly map to a Heroku app.  Each app can have one or more web worker processes and zero or more background workers.

Since we're building our services with the same base libraries we end up having a lot of duplicated glue code.  During a set of meetings in Berlin, Jonas and I were lamenting about how much copied, pasted and modified boilerplate was in our projects.

Between the API definition file and the command line to launch a program invariably sits a bin/server.js file for each service.  This script basically loads up our config system, loads our Azure Entity library, loads a Pulse publisher, a JSON Schema validator and a Taskcluster-base App.  Each background worker has its own bin/something.js which basically has a very similar loop.  Services with unit tests have a test/helper.js file which initializes the various components for testing.  Furthermore, we might have things initialize inside of a given before() or beforeEach().

The problem with having so much boiler plate is twofold.  First, each time we modify one services's boilerplate, we are now adding maintenance complexity and risk because of that subtle difference to the other services.  We'd eventually end up with hundreds of glue files which do roughly the same thing, but accomplish it complete differently depending on which services it's in.  The second problem is that within a single project, we might load the same component ten ways in ten places, including in tests.  Having a single codepath that we can test ensures that we're always initializing the components properly.

During a little downtime between sessions, Jonas and I came up with the idea to have a standard component loading system for taskcluster services.  Being able to rapidly iterate and discuss in person made the design go very smoothly and in the end, we were able to design something we were both happy with in about an hour or so.

The design we took is to have two 'directories' of components.  One is the project wide set of components which has all the logic about how to build the complex things like validators and entities.  These components can optionally have dependencies.  In order to support different values for different environments, we force the main directory to declare which 'virtual dependencies' it requires.  They are declared as a list of strings.  The second level of component directory is where these 'virtual dependencies' have their value.

Both Virtual and Concrete dependencies can either be 'flat' values or objects.  If a dependency is a string, number, function, Promise or an object without a create property, we just give that exact value back as a resolved Promise.  If the component is an object with a create property, we initialize the dependencies specified by the 'requires' list property, pass those values as properties on an object to the function at the 'create' property.  The value of that function's return is stored as a resolved promise.  Components can only depend on other components non-flat dependencies.

Using code is a good way to show how this loader works:

// lib/components.js

let loader = require('taskcluster-base').loader;
let fakeEntityLibrary = require('fake');

module.exports = loader({
  fakeEntity: {
    requires: ['connectionString'],
    setup: async deps => {
      let conStr = await deps.connectionString;
      return fakeEntityLibrary.create(conStr);
    },
  },
},
['connectionString'],
);
 
In this file, we're building a really simple component directory which only contains a contrived 'fakeEntity'.  This component depends on having a connection string to fully configure.  Since we want to use this code in production, development and testing, we don't want to bake configuration into this file, so we force the thing using this to itself give us a way to configure what the connection string.

// bin/server.js
let config = require('taskcluster-base').config('development');
let loader = require('../lib/components.js');

let load = loader({
  connectionString: config.entity.connectionString,
});

let configuredFakeEntity = await load('fakeEntity')
 
In this file, we're providing a simple directory that satisifies the 'virtual' dependencies we know that need to be fulfilled before initializing can happen.

Since we're creating a dependency tree, we want to avoid having cyclic dependencies.  I've implemented a cycle checker which ensures that you cannot configure a cyclical dependency.  It doesn't rely on the call stack being exceeded from infinite recursion either!

This is far from being the only thing that we figured out improvements for during this chat.  Two other problems that we were able to talk through were splitting out taskcluster-base and having a background worker framework.

Currently, taskcluster-base is a monolithic library.  If you want our Entities at version 0.8.4, you must take our config at 0.8.4 and our rest system at 0.8.4.  This is great because it forces services to move all together.  This is also awful because sometimes we might need a new stats library but can't afford the time to upgrade a bunch of Entities.  It also means that if someone wants to hack on our stats module that they'll need to learn how to get our Entities unit tests to work to get a passing test run on their stats change.

Our plan here is to make taskcluster-base a 'meta-package' which depends on a set of taskcluster components that we support working together.  Each of the libraries (entities, stats, config, api) will be split out into their own packages using git filter-branch to maintain history.  This is just a bit of simple leg work of ensuring that the splitting out goes smooth.

The other thing we decided on was a standardized background looping framework.  A lot of background workers follow the pattern "do this thing, wait one minte, do this thing again".  Instead of each service implementing this its own special way for each background worker, what we'd really like is to have a library which does all the looping magic itself.  We can even have nice things like a watch dog timer to ensure that the loop doesn't stick.

Once the PR has landed for the loader, I'm going to be converting the provisioner to use this new loader.  This is a part of a new effort to make Taskcluster components easy to implement.  Once a bunch of these improvements have landed, I intend to write up a couple blog posts on how you can write your own Taskcluster service.