12 October, 2015

Splitting out taskcluster-base into component libraries

The intended audience of this post is people who either work with taskcluster-base now or are interested in implementing taskcluster services in the future.

Taskcluster serverside components are currently built using the suite of libraries in the taskcluster-base npm package. This package is many things: config parsing, data persistence, statistics, json schema validators, pulse publishers, a rest api framework and some other useful tools. Having these all in one single package means that each time a contributor wants to hack on one part of our platform, she'll have to figure out how to install and run all of our dependencies. This is annoying when it's waiting for a libxml.so library build, but just about impossible for contributors who aren't on the Taskcluster platform team. You need ​Azure, Influx and AWS accounts to be able to run the full test suite. You also might experience confusing errors in a part of the library you're not even touching.

Additionally, we are starting to get to the point where some services must upgrade one part of taskcluster-base without using other parts. This is generally frowned upon, but sometimes we just need to put a bandaid on a broken system that's being turned off soon. We deal with this currently by exporting base.Entity and base.LegacyEntity. I'd much rather we just export a single base.Entity and have people who need to keep using the old Entity library use taskcluster-lib-legacyentity directly

We're working on fixing this! The structure of taskcluster-base is really primed and ready to be split up since it's already a bunch of independent libraries that just so happen to be collocated. The new component loader that landed was the first library to be included in taskcluster-base this way and I converted our configs and stats libraries last week.

The naming convention that we've settled on is that taskcluster libraries will be prefix with taskcluster-lib-X. This means we have taskcluster-lib-config, taskcluster-lib-stats. We'll continue to name services as taskcluster-Y, like taskcluster-auth or taskcluster-confabulator.  The best way to get the current supported set of taskcluster libraries is still going to be to install the taskcluster-base npm module.

Some of our libraries are quiet large and have a lot of history in them. I didn't really want to just create a new repository and copy in the files we care about and destroy the history. Instead, I wrote a simple and ugly tool (https://github.com/jhford/taskcluster-base-split) which does the pedestrian tasks involved in this split up by filtering out irrelevant history for each project, moving files around and doing some preliminary cleanup work on the new library.

This tooling gets us 90% of the way to a split out repository, but as always, a human is required to take it the last step of the way. Imports need to be fixed, dependencies must be verified and tests need to be fixed. I'm also taking this opportunity to implement babel-transpiling support in as many libraries as I can. We use babel everywhere in our application code, so it'll be nice to have it available in our platform libraries as well. I'm using the babel-runtime package instead of requiring the direct use of babel. The code produced by our babel setup is tested in tested using the node 0.12 binary without any wrappers at all.

Having different libraries will introduce the risk of our projects having version number hell. We're still going to have a taskcluster-base npm package. This package will simply be a package.json file which specifies the supported versions of the taskcluster-lib-* packages we ship as a release and an index.js file which imports and re-exports the libraries that we provide. If we have two libraries that have codependent changes, we can land new versions in those repositories and use taskcluster-base as the synchronizing mechanism.

A couple of open questions that I'd love to get input on are how we should share package.json snippets and babel configurations. We mostly have a solution for eslint, but we'd love to be able to share as much as possible in our .babelrc configuration files. If you have a good idea for how we can do that, please get in touch!

One of the goals in doing this is to make writing taskcluster components easier to write. We'd love to see components written by other teams use our framework since we know it's tested to work with Taskcluster well. It also makes it easier for the task cluster team to advise on design and maintenance concerns.

Once a few key changes have landed, I will write a series of blog posts explaining how core taskcluster services are structured.

30 September, 2015

Taskcluster Component Loader

Taskcluster is the new platform for building Automation at Mozilla.  One of the coolest design decisions is that it's composed of a bunch of limited scope, interchangeable services that have well defined and enforced apis.  Examples of services are the Queue, Scheduler, Provisioner and Index.  In practice, the server-side components roughly map to a Heroku app.  Each app can have one or more web worker processes and zero or more background workers.

Since we're building our services with the same base libraries we end up having a lot of duplicated glue code.  During a set of meetings in Berlin, Jonas and I were lamenting about how much copied, pasted and modified boilerplate was in our projects.

Between the API definition file and the command line to launch a program invariably sits a bin/server.js file for each service.  This script basically loads up our config system, loads our Azure Entity library, loads a Pulse publisher, a JSON Schema validator and a Taskcluster-base App.  Each background worker has its own bin/something.js which basically has a very similar loop.  Services with unit tests have a test/helper.js file which initializes the various components for testing.  Furthermore, we might have things initialize inside of a given before() or beforeEach().

The problem with having so much boiler plate is twofold.  First, each time we modify one services's boilerplate, we are now adding maintenance complexity and risk because of that subtle difference to the other services.  We'd eventually end up with hundreds of glue files which do roughly the same thing, but accomplish it complete differently depending on which services it's in.  The second problem is that within a single project, we might load the same component ten ways in ten places, including in tests.  Having a single codepath that we can test ensures that we're always initializing the components properly.

During a little downtime between sessions, Jonas and I came up with the idea to have a standard component loading system for taskcluster services.  Being able to rapidly iterate and discuss in person made the design go very smoothly and in the end, we were able to design something we were both happy with in about an hour or so.

The design we took is to have two 'directories' of components.  One is the project wide set of components which has all the logic about how to build the complex things like validators and entities.  These components can optionally have dependencies.  In order to support different values for different environments, we force the main directory to declare which 'virtual dependencies' it requires.  They are declared as a list of strings.  The second level of component directory is where these 'virtual dependencies' have their value.

Both Virtual and Concrete dependencies can either be 'flat' values or objects.  If a dependency is a string, number, function, Promise or an object without a create property, we just give that exact value back as a resolved Promise.  If the component is an object with a create property, we initialize the dependencies specified by the 'requires' list property, pass those values as properties on an object to the function at the 'create' property.  The value of that function's return is stored as a resolved promise.  Components can only depend on other components non-flat dependencies.

Using code is a good way to show how this loader works:

// lib/components.js

let loader = require('taskcluster-base').loader;
let fakeEntityLibrary = require('fake');

module.exports = loader({
  fakeEntity: {
    requires: ['connectionString'],
    setup: async deps => {
      let conStr = await deps.connectionString;
      return fakeEntityLibrary.create(conStr);
    },
  },
},
['connectionString'],
);
 
In this file, we're building a really simple component directory which only contains a contrived 'fakeEntity'.  This component depends on having a connection string to fully configure.  Since we want to use this code in production, development and testing, we don't want to bake configuration into this file, so we force the thing using this to itself give us a way to configure what the connection string.

// bin/server.js
let config = require('taskcluster-base').config('development');
let loader = require('../lib/components.js');

let load = loader({
  connectionString: config.entity.connectionString,
});

let configuredFakeEntity = await load('fakeEntity')
 
In this file, we're providing a simple directory that satisifies the 'virtual' dependencies we know that need to be fulfilled before initializing can happen.

Since we're creating a dependency tree, we want to avoid having cyclic dependencies.  I've implemented a cycle checker which ensures that you cannot configure a cyclical dependency.  It doesn't rely on the call stack being exceeded from infinite recursion either!

This is far from being the only thing that we figured out improvements for during this chat.  Two other problems that we were able to talk through were splitting out taskcluster-base and having a background worker framework.

Currently, taskcluster-base is a monolithic library.  If you want our Entities at version 0.8.4, you must take our config at 0.8.4 and our rest system at 0.8.4.  This is great because it forces services to move all together.  This is also awful because sometimes we might need a new stats library but can't afford the time to upgrade a bunch of Entities.  It also means that if someone wants to hack on our stats module that they'll need to learn how to get our Entities unit tests to work to get a passing test run on their stats change.

Our plan here is to make taskcluster-base a 'meta-package' which depends on a set of taskcluster components that we support working together.  Each of the libraries (entities, stats, config, api) will be split out into their own packages using git filter-branch to maintain history.  This is just a bit of simple leg work of ensuring that the splitting out goes smooth.

The other thing we decided on was a standardized background looping framework.  A lot of background workers follow the pattern "do this thing, wait one minte, do this thing again".  Instead of each service implementing this its own special way for each background worker, what we'd really like is to have a library which does all the looping magic itself.  We can even have nice things like a watch dog timer to ensure that the loop doesn't stick.

Once the PR has landed for the loader, I'm going to be converting the provisioner to use this new loader.  This is a part of a new effort to make Taskcluster components easy to implement.  Once a bunch of these improvements have landed, I intend to write up a couple blog posts on how you can write your own Taskcluster service.