How much better are two steps than three? Does it matter if something takes five minutes instead of twenty? When it comes to software deployment and provisioning, does instant really matter?
Recently, I was ranting on this subject to a user who had the misfortune of asking me about it in person.
“Truly instant provisioning and deployment is the ultimate goal,” I said. “10 seconds isn’t good enough. We have to –”,
“Look,” he interrupted, “I love what you guys are doing and don’t want you to stop, but why are you so obsessed with this?”
My immediate answer: because we’re obsessive people. A couple years ago we stumbled across what we view as a glaring disconnect between the way software is developed and the way it’s provisioned and deployed. Now, like a person who’s noticed a crooked picture on the wall, we are totally fixated on setting it straight.
This was a shallow answer though, and he wasn’t convinced:
“I mean it’s not that bad as is, is it?” he said. “It’s been improving steadily for years.”
And that’s when it hit me. While everyone is adversely affected by this growing problem, most people don’t actually see it. It has crept up on us gradually.
1996: A development team of perhaps 10 people (toting advanced computer science degrees) spends 6 months building software to laboriously defined specifications, writing their own framework, and using limited libraries and no testing harness. It then takes an IT team of say 3 people a couple of weeks to provision server resources, configure and install the OS and software stack, and deploy the software.
2000: A more ambitious team of 6 people (toting half-finished computer science degrees) spends 3 months building a web application to satisfy a PRD, using primitive frameworks and some integration testing. It then takes 3 people about a week (optimistically) to provision servers from IT, install the web stack, and deploy the app.
2004: A 4-person team (half of whom went to art school) spends 4 weeks writing a web app to some short and loose specs, using decent frameworks, unit and integration testing, and lots of user feedback. It then takes just 2 people about a week to provision virtual servers, install a complete web stack, and deploy the app.
2008: An agile team of 4 people (plus perhaps a scrum master) spend a week building the first complete version of their web app from just a rough user story, using advanced web frameworks, fully featured libraries, test-driven development, and sharp agile practices. It then takes just one person a few days to provision new resources from IT or a fast-moving hosting company, install the default web stack, and do the initial deploy.
Let’s look at these data points:
The bottom row is the percentage of the total project/iteration time spent on provisioning and deployment. Look at it this way:
This is shocking. Provisioning and deployment has gotten 10x faster during this period, but development has gotten 130x faster. Development teams are getting smaller and more agile, doing shorter iterations (deploying more often), and scaling their apps more quickly (more frequent provisioning). This results in a dramatic increase in the portion of time spent provisioning and deploying.
At this rate, in less than 3 years we’ll be spending as much time deploying and provisioning as we spend developing. These numbers are based on our direct experience with medium/large company software projects. You can play around with the scenarios; even with widely different numbers the curve is about the same.
The reason most people don’t see this growing problem is because it’s masked by the gradual improvement of the deployment and provisioning process.
Capistrano, for example, is an awesome deployment tool, which makes us feel great about the improving state of deployment tools. But these incremental improvements aren’t keeping up with agile development; they’re an investment in a race that can’t be won.
We see this playing out often now. We’ve been contacted by quite a few Fortune 500 companies lately who, after a massive agile restructuring of their software development organization, discovered they are now spending as much time on provisioning as development. All the economic benefit of agile development is consumed by provisioning – this has enormous fiscal impact.
How do we solve this problem? It doesn’t seem possible to both make provisioning/deployment faster than development, and also keep it there by continuously improving at a higher rate. How do we get off this treadmill?
What if we could provision and deploy instantly? This is where the difference between “a little” and “none” comes into play. If it’s instant, the portion of time spent on it goes to zero. The development process can then improve at any speed, and deployment/provisioning will never become a barrier. Problem solved.
This is, by the numbers, why instant deployment matters.
How are we actually achieving instant deployment? Over the next two weeks we’ll be posting more information on the challenges involved, and how we’ve designed Heroku’s architecture to meet them.