Video Transcript


Ground Control to Major TOML: Why Buildpacks Use a Most Peculiar Format

YAML files dominate configuration in the cloud native ecosystem. They’re used by Kuberentes, Helm, Tekton, and many other projects to define custom configuration and workflows. But YAML has its oddities, which is why the Cloud Native Buildpacks project chose TOML as its primary configuration format.

TOML is a minimal configuration file format that's easy to read because of its simple semantics. You can learn more about TOML from the official documentation, but a simple buildpack TOML file looks like this:

api = "0.2"

id = "heroku/maven"
version = "1.0"
name = "Maven"

Unlike YAML, TOML doesn’t rely on significant whitespace with difficult to read indentation. TOML is designed to be human readable, which is why it favors simple structures. It’s also easy for machines to read and write; you can even append to a TOML file without reading it first, which makes it a great data interchange format. But data interchange and machine readability aren’t the main driver for using TOML in the Buildpacks project; it’s humans.

Blog post illustration

Put Your Helmet On

The first time you use Buildpacks, you probably won’t need to write a TOML file. Buildpacks are designed to get out of your way, and disappear into the details. That’s why there’s no need for large configuration files like a Helm values.yaml or a Kubernetes pod configuration.

Buildpacks favor convention over configuration, and therefore don’t require complex customizations to tweak the inner workings of its tooling. Instead, Buildpacks detect what to do based on the contents of an application, which means configuration is usually limited to simple properties that are defined by a human.

Buildpacks also favor infrastructure as imperative code (rather than declarative). Buildpacks themselves are functions that run against an application, and are best implemented in higher level languages, which can use libraries and testing.

All of these properties lend to a simple configuration format and schema that doesn’t define complex structures. But that doesn’t mean the decision to use TOML was simple.

Can You Hear Me, Major TOML?

There are many other formats the Buildpacks project could have used besides YAML or TOML, and the Buildpacks core team considered all of these in the early days of the project.

JSON has simple syntax and semantics that are great for data interchange, but it doesn’t make a great human-readable format; in part because it doesn’t allow for comments. Buildpacks use JSON for machine readable config, like the OCI image metadata. But it shouldn’t be used for anything a human writes.

XML has incredibly powerful properties including schema validation, transformation tools, and rich semantics. It’s great for markup (like HTML) but it's much too heavy of a format for what Buildpacks require.

In the end, the Buildpacks project was comfortable choosing TOML because there was solid prior art (even though the format is somewhat obscure). In the cloud native ecosystem, the containerd project uses TOML. Additionally, many language ecosystem tools like Cargo (for Rust) and Poetry (for Python) use TOML to configure application dependencies.

Commencing Countdown, Engines On

The main disadvantage of TOML is its ubiquity. Tools that parse and query TOML files (something comparable to jq) aren’t readily available, and the format can still be jarring to new users even though it’s fairly simple.

Every trend has to start somewhere, and the Cloud Native Buildpacks project is happy to be one of the projects stepping through the door.

If you want to learn more or have any questions around Cloud Native Buildpacks, we will be hosting a Live AMA at Hackernoon on July 28th at 2pm PDT. See you there!

Originally published: July 22, 2020

Browse the archives for engineering or all blogs Subscribe to the RSS feed for engineering or all blogs.