Deployment That Just Works

Last week I talked a bit about why instant deployment matters. A few people have since commented that it’s not instant deployment that matters to them, but rather deployment that just works every time.

Of course, what we’re really talking about is both. Part of achieving deployment that just works is decreasing complexity and removing steps – each a point of possible failure. We are working toward deployment that’s both instant and completely reliable, because we think those things are tightly linked.

We’ve rolled out some new content today explaining more about how our platform works, including some more detailed architectural information. We’re hoping it will continue to build a better picture of the vision we’re striving for. I want to take some time here to explain some of the background behind this architecture.

Instant deployment that just works is, of course, a tall order.
In order to achieve it, there are several requisites:

1. A Sharp Focus on Web Apps

Provisioning is different for different types of apps – a video transcoding farm is very different from a web app. The only way to make the process instant is to know what type of thing you’re dealing with ahead of time. This is achieved by only handling one type of thing (Heroku, for example, only handles web apps, and even then, only those written in the ruby language).

2. Standardization of App Structure

Any time you want to plug and play, most of the work is centered around standardization. Because all lamps have the same power plug and voltage requirements, you can move them around freely and plug them in anywhere. Similarly, all apps need to be self-contained and have the same structure and environmental requirements.

A software framework like Ruby on Rails (or even just Rack) gets us most of the structure we need, but a lot of work remains in self-containment and standardization of environmental requirements.

3. A Dynamic Platform

Aside from standardized plugs and voltage, the reason you can plug a lamp in anywhere is that hidden behind the walls, there’s a complex system of wiring, circuits, breakers, and transformers, which distribute power to the lamps evenly and only when needed, and prevent short circuits and other safety hazards. If one lamp has a short, it won’t destroy the other lamps or your home.

Similarly, we need a platform architecture that can move apps around independently, start and stop them instantly, provide the standardized environment (library dependencies, databases, caching, etc.), distribute compute power evenly, and prevent one greedy or malfunctioning app from damaging others.

4. On-Demand Provisioning of Underlying Infrastructure Resources

Lastly, no matter how many lamps you plug in (within reason), you won’t run out of power. The enormous capacity of electricity producers combined with a sophisticated grid of distribution and storage technologies ensure that power is available right when you need it. This whole system works because you don’t have to plug in a lamp and then wait for someone to turn on another generator somewhere.

This last requirement is non-trivial, and has historically been a major barrier to instant provisioning/deployment. Thankfully it’s available now, in the form of utility computing from services like Amazon’s EC2. When compute power is needed, you can get it within seconds or minutes. It’s still not instant, but it’s close enough that an efficient dynamic platform can hide this delay by maintaining standby capacity (the same way batteries and capacitors hide the delay of a generator starting up on the electrical grid).

This is the real value of utility computing: it has the power to enable truly instant provisioning and deployment, by providing one of the four requisites.

Heroku’s Architecture

We discovered the 4 requisites above as we built our platform, so our current architecture is designed specifically to address these issues. Item 1 is embodied by our decision to specialize on Ruby-based web apps, and item 4 is simply available to us today for the first time.

Items 2 and 3, however, deserve some explanation:

2. Standardization of App Structure

The overall structure of our platform is designed to standardize the web stack, vastly simplifying the deployment process and removing a lot of decisions you might have to make when designing and deploying an app.

Should we use memcached? It’s already included. What about static content? Don’t worry about it – our high-performance HTTP cache will handle it automatically. How many app servers should we run, and how many machines do we need for them? Just start with a few and change it instantly at any time, and never even think about how many servers they’re running on.

We’ve also settled on a standard app stack we call a dyno. The way an app retrieves its configuration from the environment has also been standardized.

3. A Dynamic Platform

We’ve done a huge amount of work on this over the last year, and our efforts have been focussed by the 25,000 apps now running on Heroku. The routing mesh gives us the ability to easily move apps around, and to scale our resources independently of scaling an individual app.

The dyno grid lets us scale an app up, as well as distribute power evenly and route around problems.

This architecture let’s you deploy via Git by simply pushing your code to your app’s Heroku repository. We’re actually “compiling” your app now as it’s pushed; performing integrity checks and verifying that it can start.

So did we get there? Have we achieved instant deployment? Well, we think we’re getting pretty damn close.

Browse the blog archives or subscribe to the full-text feed.