12 October, 2015

Splitting out taskcluster-base into component libraries

The intended audience of this post is people who either work with taskcluster-base now or are interested in implementing taskcluster services in the future.

Taskcluster serverside components are currently built using the suite of libraries in the taskcluster-base npm package. This package is many things: config parsing, data persistence, statistics, json schema validators, pulse publishers, a rest api framework and some other useful tools. Having these all in one single package means that each time a contributor wants to hack on one part of our platform, she'll have to figure out how to install and run all of our dependencies. This is annoying when it's waiting for a libxml.so library build, but just about impossible for contributors who aren't on the Taskcluster platform team. You need ​Azure, Influx and AWS accounts to be able to run the full test suite. You also might experience confusing errors in a part of the library you're not even touching.

Additionally, we are starting to get to the point where some services must upgrade one part of taskcluster-base without using other parts. This is generally frowned upon, but sometimes we just need to put a bandaid on a broken system that's being turned off soon. We deal with this currently by exporting base.Entity and base.LegacyEntity. I'd much rather we just export a single base.Entity and have people who need to keep using the old Entity library use taskcluster-lib-legacyentity directly

We're working on fixing this! The structure of taskcluster-base is really primed and ready to be split up since it's already a bunch of independent libraries that just so happen to be collocated. The new component loader that landed was the first library to be included in taskcluster-base this way and I converted our configs and stats libraries last week.

The naming convention that we've settled on is that taskcluster libraries will be prefix with taskcluster-lib-X. This means we have taskcluster-lib-config, taskcluster-lib-stats. We'll continue to name services as taskcluster-Y, like taskcluster-auth or taskcluster-confabulator.  The best way to get the current supported set of taskcluster libraries is still going to be to install the taskcluster-base npm module.

Some of our libraries are quiet large and have a lot of history in them. I didn't really want to just create a new repository and copy in the files we care about and destroy the history. Instead, I wrote a simple and ugly tool (https://github.com/jhford/taskcluster-base-split) which does the pedestrian tasks involved in this split up by filtering out irrelevant history for each project, moving files around and doing some preliminary cleanup work on the new library.

This tooling gets us 90% of the way to a split out repository, but as always, a human is required to take it the last step of the way. Imports need to be fixed, dependencies must be verified and tests need to be fixed. I'm also taking this opportunity to implement babel-transpiling support in as many libraries as I can. We use babel everywhere in our application code, so it'll be nice to have it available in our platform libraries as well. I'm using the babel-runtime package instead of requiring the direct use of babel. The code produced by our babel setup is tested in tested using the node 0.12 binary without any wrappers at all.

Having different libraries will introduce the risk of our projects having version number hell. We're still going to have a taskcluster-base npm package. This package will simply be a package.json file which specifies the supported versions of the taskcluster-lib-* packages we ship as a release and an index.js file which imports and re-exports the libraries that we provide. If we have two libraries that have codependent changes, we can land new versions in those repositories and use taskcluster-base as the synchronizing mechanism.

A couple of open questions that I'd love to get input on are how we should share package.json snippets and babel configurations. We mostly have a solution for eslint, but we'd love to be able to share as much as possible in our .babelrc configuration files. If you have a good idea for how we can do that, please get in touch!

One of the goals in doing this is to make writing taskcluster components easier to write. We'd love to see components written by other teams use our framework since we know it's tested to work with Taskcluster well. It also makes it easier for the task cluster team to advise on design and maintenance concerns.

Once a few key changes have landed, I will write a series of blog posts explaining how core taskcluster services are structured.