November 13, 2012 by Matthew Manning
Buildpacks are an extremely powerful tool for specifying the ecosystem of tools and dependencies packaged with your Heroku application and controlling the way the application is built from code to a deployed app.
In the post announcing the release of buildpacks we illustrated this point, explaining how buildpacks provide the mechanism by which Heroku supports a variety of languages and frameworks, not just Ruby and Rails. We also briefly covered some of the end-user customizations that can be achieved with custom buildpacks, such as adding binary support and modifying the build process.
Today we'll examine the basic structure of buildpacks and study some example customizations to better understand how they can be used to extend the capabilities of the Heroku defaults.
At its core, a buildpack is a collection of 3 Bash scripts stored in the
bin directory. These scripts are called detect, compile, and release. We'll take a quick look at how each of these scripts contributes to supporting a specific language or framework.
An excellent skeleton buildpack with all of these minimal components is Ryan Smith's null-buildpack. If you're creating a buildpack from scratch, forking null-buildpack is a good place to start.
bin/detect script is important to Heroku's default buildpacks. When an app is deployed, the detect script is used to figure out which buildpack is appropriate for the project.
Most buildpacks detect frameworks by searching for certain config files. For example, the Ruby buildpack looks for a Gemfile. The Node.js buildpack looks for packages.json, and the Python buildpack looks for requirements.txt.
If a buildpack matches, it returns an exit code of 0 and prints the language/framework name to
STDOUT. The aforementioned null-buildpack shows what an absolutely minimal detect script would look like.
In the case of custom buildpacks you'll be specifying the buildpack directly, so detection isn't as important, and a minimal detect script is usually sufficient.
bin/compile script is where most of the magic happens. This script takes 2 arguments, BUILD_DIR and CACHE_DIR.
BUILD_DIR gives you a handle for the root directory of the app, so you can read and write files into the slug. This is where binaries are installed, Heroku-specific config files are written, dependencies are resolved and installed, and static files are built.
CACHE_DIR gives you a location to persist build artifacts between deployments.
We'll take a closer look at modifying the slug and caching build artifacts in the examples below.
bin/compile script modifies the slug, the
bin/release script modifies the release. Instead of modifying files, this script returns a YAML-formatted hash to define any default config variables, Add-ons, or default process types needed by the buildpack.
Now that we understand the basic structure of buildpack scripts and their roles, lets take a look at some example hacks.
The ability to install binaries is critical for most language support. Richard Schneeman's mruby buildpack article over on RubySource provides an excellent example of installing custom binaries to support a new language.
The first step for getting new binaries onto Heroku is building them in such a way that they can be run on Heroku dynos (64-bit Linux virtual machines). It turns out that building these binaries directly on Heroku is the easiest method, since the operating system of a Heroku dyno contains common Linux development tools.
Heroku user Jonathan Hoyt discovered these tools early on and blogged about the process of building xpdf for Heroku using:
heroku run bashto boot a new dyno and get a bash session on it
curlto download the source
maketo build the project
scpto copy the build artifacts to a local machine
Although you could certainly copy Jon's procedure, Heroku now provides a tool called Vulcan to make this process much easier. The
vulcan create command deploys a custom app to Heroku under your account. Once the app is created, you use the
vulcan build command to upload source, build the project, and download the results all in one step.
Once Vulcan has completed and the build artifacts have been downloaded, you'll need to host them somewhere on the web so that Heroku's build servers will be able to download them. Heroku's default language binaries are stored on AmazonS3, for example. Make sure the location you use is publicly readable.
Next you need to make the buildpack copy the binary files down into your project. This is done in the buildpack's
bin/compile script. We can again refer to the mruby buildpack for a straightforward example of how the files are copied down. The steps used are as follows:
Change directories into the build directory. (This directory will be the root of any apps deployed with this buildpack.)
Finally, you'll need to add the location of your binaries to the PATH environment variable so that they can be called from anywhere. As discussed earlier, default environment variables are defined by the YAML string returned by the
bin/release script. Here you can see PATH being set for the mruby binaries.
The default Ruby buildpack provides support for the Rails asset pipeline by running
To address this slowness, Nathan Broadbent released the turbo-sprockets-rails3 gem. TurboSprockets only compiles assets whose source files have changed, making the asset compilation step much faster after the initial run. This is a nifty enhancement, but it depends on the ability to cache assets between builds. This is where the CACHE_DIR argument to the compile script comes in handy.
(NOTE: The Ruby buildpack is a bit different in that it's not totally Bash. The
bin/compile script invokes a Ruby script. The Ruby code provides some convenience methods for manipulating files in and out of the build cache.)
Nathan forked and extended the default Ruby buildpack to take advantage of turbo-sprockets-rails3's abilities. This buildpack modifies the default behavior by loading cached files from
BUILD_DIR/public/assets. The assets:precompile task then runs as usual, but since turbo-sprockets-rails3 is installed, any unmodified assets won't be rebuilt. The script then runs a custom asset expiration task, storing assets back to
CACHE_DIR/public/assets if it's successful, and clearing
CACHE_DIR/public/assets if it fails.
Many popular languages have several competing frameworks for web apps; Rails and Sinatra for Ruby and Django and Pylons for Python are some well-known examples. Usually, these frameworks are contained within a few libraries, so if you want to support a new framework it makes sense to modify existing language buildpacks instead of starting from scratch. James Ward took this approach when he wanted to support Revel, a web framework for the Go programming language.
Next, he modified the
bin/detect script to look for a Revel-specific file and announce that it's a Revel buildpack.
Then he added some functionality to the end of the
bin/compile script to fetch and build Revel.
A new line in the
bin/release script defines a default web process in the Procfile for running Revel.
I hope this tour into hacking buildpacks has been informative, and you've gained some insight into how buildpacks work and how they can be extended to meet your needs. Whether you want to host apps in a new language, or tweak the tools for an existing one, buildpacks are a step towards always answering "yes" to the question, "Does it run on Heroku?"
For more reference information, please check out the buildpack articles available in our Dev Center.