Deployment That Just Works

Last week I talked a bit about why instant deployment matters. A few people have since commented that it’s not instant deployment that matters to them, but rather deployment that just works every time.

Of course, what we’re really talking about is both. Part of achieving deployment that just works is decreasing complexity and removing steps – each a point of possible failure. We are working toward deployment that’s both instant and completely reliable, because we think those things are tightly linked.

We’ve rolled out some new content today explaining more about how our platform works, including some more detailed architectural information. We’re hoping it will continue to build a better picture of the vision we’re striving for. I want to take some time here to explain some of the background behind this architecture.

Instant deployment that just works is, of course, a tall order.
In order to achieve it, there are several requisites:

1. A Sharp Focus on Web Apps

Provisioning is different for different types of apps – a video transcoding farm is very different from a web app. The only way to make the process instant is to know what type of thing you’re dealing with ahead of time. This is achieved by only handling one type of thing (Heroku, for example, only handles web apps, and even then, only those written in the ruby language).

2. Standardization of App Structure

Any time you want to plug and play, most of the work is centered around standardization. Because all lamps have the same power plug and voltage requirements, you can move them around freely and plug them in anywhere. Similarly, all apps need to be self-contained and have the same structure and environmental requirements.

A software framework like Ruby on Rails (or even just Rack) gets us most of the structure we need, but a lot of work remains in self-containment and standardization of environmental requirements.

3. A Dynamic Platform

Aside from standardized plugs and voltage, the reason you can plug a lamp in anywhere is that hidden behind the walls, there’s a complex system of wiring, circuits, breakers, and transformers, which distribute power to the lamps evenly and only when needed, and prevent short circuits and other safety hazards. If one lamp has a short, it won’t destroy the other lamps or your home.

Similarly, we need a platform architecture that can move apps around independently, start and stop them instantly, provide the standardized environment (library dependencies, databases, caching, etc.), distribute compute power evenly, and prevent one greedy or malfunctioning app from damaging others.

4. On-Demand Provisioning of Underlying Infrastructure Resources

Lastly, no matter how many lamps you plug in (within reason), you won’t run out of power. The enormous capacity of electricity producers combined with a sophisticated grid of distribution and storage technologies ensure that power is available right when you need it. This whole system works because you don’t have to plug in a lamp and then wait for someone to turn on another generator somewhere.

This last requirement is non-trivial, and has historically been a major barrier to instant provisioning/deployment. Thankfully it’s available now, in the form of utility computing from services like Amazon’s EC2. When compute power is needed, you can get it within seconds or minutes. It’s still not instant, but it’s close enough that an efficient dynamic platform can hide this delay by maintaining standby capacity (the same way batteries and capacitors hide the delay of a generator starting up on the electrical grid).

This is the real value of utility computing: it has the power to enable truly instant provisioning and deployment, by providing one of the four requisites.

Heroku’s Architecture

We discovered the 4 requisites above as we built our platform, so our current architecture is designed specifically to address these issues. Item 1 is embodied by our decision to specialize on Ruby-based web apps, and item 4 is simply available to us today for the first time.

Items 2 and 3, however, deserve some explanation:

2. Standardization of App Structure

The overall structure of our platform is designed to standardize the web stack, vastly simplifying the deployment process and removing a lot of decisions you might have to make when designing and deploying an app.

Should we use memcached? It’s already included. What about static content? Don’t worry about it – our high-performance HTTP cache will handle it automatically. How many app servers should we run, and how many machines do we need for them? Just start with a few and change it instantly at any time, and never even think about how many servers they’re running on.

We’ve also settled on a standard app stack we call a dyno. The way an app retrieves its configuration from the environment has also been standardized.

3. A Dynamic Platform

We’ve done a huge amount of work on this over the last year, and our efforts have been focussed by the 25,000 apps now running on Heroku. The routing mesh gives us the ability to easily move apps around, and to scale our resources independently of scaling an individual app.

The dyno grid lets us scale an app up, as well as distribute power evenly and route around problems.

This architecture let’s you deploy via Git by simply pushing your code to your app’s Heroku repository. We’re actually “compiling” your app now as it’s pushed; performing integrity checks and verifying that it can start.

So did we get there? Have we achieved instant deployment? Well, we think we’re getting pretty damn close.

The Future of Deployment

Application deployment is changing. In relatively short order I’ve gone from buying hardware, to monthly hosting, to metered CPU time, and from building my open-source software manually, to package managers, to fancy config tools and recipes to pre-build whole machine images. What’s next?

The Old Way

I can deploy Rails apps in a traditional hosting environment pretty quickly. For a small app, I might make a new unix user and database on a personal Slicehost slice and do a quick code checkout. After setting up a few permissions and twiddling my Nginx config, in a matter of fifteen minutes or so my app is online. Not bad at all.

For a bigger app, it takes more time. In days of yore I’d build a server from parts or buy one of the excellent Pogo Linux servers and put it in a colo. OS install, Xen setup, guest OS install, OS package setup, security lockdown, then on to the task of all the stack setup (database, Rails, source control) specific to the application to be run.

Once you get into multiple servers, the complexity multiplies out quickly. There are dozens of small decisions to make about how resources are allocated. More RAM or more CPU for the database machine? One slave database, or two? Hardware load balancer vs. multiple IPs vs. something else? All of these require both detailed knowledge about hardware and software deployments, combined with a huge amount of predictive guesswork to try to foresee the quantity and type of load that the app being deployed is likely to face in the next 3, 6, or 12 months.

There’s an enterprisey word for this process: provisioning.

The New Way

Amazon’s EC2 is the vanguard of the new generation of cloud computing. Provisioning a server was formerly a phone call and days or weeks of waiting. Now it’s a REST call and 30 seconds of waiting. Awesome.

But this is a very raw resource: there are still many provisioning decisions to be made, software to set up, and then on to deployment of the app itself. Excellent services like RightScale and Engine Yard’s new offering Solo can help automate a lot of this process and minimize the management burden. So far, so good.

But what if provisioning was instantaneous, requiring no upfront decisions about resource allocation? What if you didn’t need to think at all about the server hardware or software, but only about your application code? How would this change how we build applications?

The Future

When technology breakthroughs make something smaller, or faster, or cheaper, it doesn’t just change current use; it creates whole new types of use. If app deployment is instantaneous, without having to plan for resources, allocate servers, or beg approval from the IT department, what kind of apps will we build that don’t get built today?

In the past decade we’ve seen widespread adoption of agile methodologies in the development of software. This has transformed software development from a slow, failure-prone, and sometimes downright painful process into one that is fast, fun, and fulfilling. But deployment of applications has changed hardly at all during that same time period. The way you deploy a Rails, Merb, Sinatra, or Django app today is very similar to how you deployed a Perl app in 1999.

This coming decade is going to see an agile revolution for the deployment side of the equation. The manual, guesswork-heavy methods of provisioning that we use today are soon to be superseded by methods that will make deploying an app fast, easy, and fun.

No one knows quite what that will look like yet (though at Heroku we certainly have our own opinion), but one thing is for sure: the time is ripe for a revolution in IT.

Browse the blog archives, subscribe to the full-text feed, or visit the engineering blog.