The New Heroku (Part 3 of 4): Visibility & Introspection

Visibility and introspection capabilities are critical for managing and debugging real-world applications. But cloud platforms are often lacking when it comes to visibility. The magical black box is great when it "just works," but not so great when your app breaks and you can't look inside the box.

Standard introspection tools used in server-based deployments — such as ssh, ps aux, top, tail -f logfile, iostat — aren't valid in a multi-tenant cloud environment. We need new tools, new approaches. Heroku's new runtime stack, Celadon Cedar, includes three powerful tools for visibility and introspection: heroku ps, heroku logs, and heroku releases. This post will examine each.

But first, a story.

The Deploy Gone Wrong (A Parable)

The FooBaz Co development team practices continuous deployment, pushing new code to their production app running on Heroku several times a day. Today, happy-go-lucky coder Ned has done a small refactoring which pulls some duplicated code out into a new library, called Utils. The unit tests are all green, and the app looks good running under Foreman on his local workstation; so Ned pushes his changes up to production.

$ git push heroku
...
-----> Heroku receiving push
...
-----> Launching... done, v74
       http://foobaz-production.herokuapp.com deployed to Heroku

Ned likes to use the introspection tools at his disposal to check the health of the app following any deploy. He starts by examining the app's running processes with heroku ps:

$ heroku ps
Process       State               Command
------------  ------------------  ------------------------------
web.1         crashed for 4s      bundle exec thin start -p $PORT -e..
web.2         crashed for 2s      bundle exec thin start -p $PORT -e..
worker.1      crashed for 3s      bundle exec rake jobs:work
worker.2      crashed for 2s      bundle exec rake jobs:work

Disaster! All the processes in the process formation have crashed. Ned has a red alert on his hands — the production site is down.

Recovering Quickly

Adrenaline pumping, Ned now uses the second introspection tool, heroku logs:

$ heroku logs --ps web.1
2011-06-19T08:35:19+00:00 heroku[web.1]: Starting process with command: `bundle exec thin start -p 38180 -e production`
2011-06-19T08:35:21+00:00 app[web.1]: /app/config/initializers/load_system.rb:12:in `<top (required)>': uninitialized constant Utils (NameError)
2011-06-19T08:35:21+00:00 app[web.1]:   from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:201:in `block (2 levels) in <class:Engine>'
2011-06-19T08:35:21+00:00 app[web.1]:   from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:200:in `each'
2011-06-19T08:35:21+00:00 app[web.1]:   from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:200:in `block in <class:Engine>'
...

All processes crashed trying to start up, because the Utils module is missing. Ned forgot to add lib/utils.rb to Git.

Ned could press forward by adding the file and pushing again. But the wise move here is to roll back to a known good state, and then think about the fix.

Heroku's third visibility tool, releases, tracks deploys and other changes. It includes a powerful undo capability: rollback. So Ned takes a deep breath, then runs:

$ heroku rollback
-----> Rolling back to v73... done, v75

Ned checks heroku ps:

$ heroku ps
Process       State               Command
------------  ------------------  ------------------------------
web.1         up for 1s           bundle exec thin start -p $PORT -e..
web.2         up for 1s           bundle exec thin start -p $PORT -e..
worker.1      starting for 3s     bundle exec rake jobs:work
worker.1      starting for 2s     bundle exec rake jobs:work

The app's processes are successfully restarting in their previous state. Crisis averted, Ned can now take his time examining what went wrong, and how to fix it, before deploying again.

The Fix

Ned investigates locally and confirms with git status that he forgot to add the file. He adds the file to Git and commits, this time double-checking his work.

He pushes to Heroku again, then uses all three introspection techniques to confirm the newly-deployed app is healthy:

$ heroku ps
Process       State               Command
------------  ------------------  ------------------------------
web.1         up for 2s           bundle exec thin start -p $PORT -e..
web.2         up for 2s           bundle exec thin start -p $PORT -e..
worker.1      up for 1s           bundle exec rake jobs:work
worker.1      up for 2s           bundle exec rake jobs:work

$ heroku logs
2011-06-19T08:39:17+00:00 heroku[web.1]: Starting process with command: `bundle exec thin start -p 56320 -e production`
2011-06-19T08:39:19+00:00 app[web.1]: >> Using rack adapter
2011-06-19T08:39:19+00:00 app[web.1]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy)
2011-06-19T08:39:19+00:00 app[web.1]: >> Maximum connections set to 1024
2011-06-19T08:39:19+00:00 app[web.1]: >> Listening on 0.0.0.0:56320, CTRL+C to stop

$ heroku releases
Rel   Change                          By                    When
----  ----------------------          ----------            ----------
v76   Deploy d706b4a                  ned@foobazco.com      1 minute ago
v75   Rollback to v73                 ned@foobazco.com      14 minutes ago
v74   Deploy 20a5742                  ned@foobazco.com      15 minutes ago
v73   Deploy df7bb82                  rick@foobazco.com     2 hours ago

Golden. Ned breathes a sigh of relief, and starts composing an email to his team about how they should really think about using a staging app as protection against this kind of problem in the future.

Now that we've seen each of these three visibility tools in action, let's look at each in more depth.

Visibility Tool #1: ps

heroku ps is a spiritual sister to the unix ps command, and a natural extension of the process model approach to running apps. But where unix's ps is for a single machine, heroku ps spans all of the app's processes on the distributed execution environment of the dyno manifold.

A clever trick here is to use the watch command for a realtime display of your app's process status:

$ watch heroku ps
Every 2.0s: heroku ps                          Sun Jun 19 01:44:55 2011

Process       State               Command
------------  ------------------  ------------------------------
web.1         up for 16h          bundle exec rackup -p $PORT -E $RA..

Leave this running in a terminal as you push code, scale the process formation, change config vars, or add add-ons, and you'll get a first-hand look at how the Heroku process manager handles your processes.

Dev Center: ps command

Visibility Tool #2: Logs

In server-based deploys, logs often exist as files on disk, which can lead to us thinking of logs as files (hence "logfiles"). A better conceptual model is:

Logs are a stream of time-ordered events aggregated from the output streams of all the app's running processes, system components, and backing services.

Heroku's Logplex routes log streams from all of these diverse sources into a single channel, providing the foundation for truly comprehensive logging. Heroku aggregates three categories of logs for your app:

App logs - Output from your app, such as: everything you'd normally expect to see in Rails' production.log, output from Thin, output from Delayed Job.
System logs - Messages about actions taken by the Heroku platform infrastructure on behalf of your app, such as: restarting a crashed process, idling or unidling a web dyno, or serving an error page due to a problem in your app.
API logs - Messages about administrative actions taken by you and other developers working on your app, such as: deploying new code, scaling the process formation, or toggling maintenance mode.

Filtering down to just API logs is a great way to see what you and other collaborators on the app have been up to:

$ heroku logs --source heroku --ps api
2011-06-18T08:21:37+00:00 heroku[api]: Set maintenance mode on by kate@foobazco.com
2011-06-18T08:21:39+00:00 heroku[api]: Config add ADMIN_PASSWORD by kate@foobazco.com
2011-06-18T08:21:39+00:00 heroku[api]: Release v4 created by kate@foobazco.com
2011-06-18T08:21:43+00:00 heroku[api]: Scale to web=4 by kate@foobazco.com
2011-06-18T08:22:01+00:00 heroku[api]: Set maintenance mode off by kate@foobazco.com

If your teammates are actively performing administrative actions on the app, running the above command with the additional argument of --tail will let you watch the events unfold as they happen. Add watch heroku ps and watch heroku releases in terminals paneled to all be visible at once, and you've got a mini-mission control for your app.

Dev Center: Logging

Visibility Tool #3: Releases

Releases are a history of changes to your app. New releases are created whenever you deploy new code, change config vars, or change add-ons. Each release stores the full state of the code and config, in addition to audit information such as who made the change and a timestamp.

This is the release history following Ned's bad deploy and subsequent recovery:

$ heroku releases
Rel   Change                          By                    When
----  ----------------------          ----------            ----------
v76   Deploy d706b4a                  ned@foobazco.com      1 minute ago
v75   Rollback to v73                 ned@foobazco.com      14 minutes ago
v74   Deploy 20a5742                  ned@foobazco.com      15 minutes ago
v73   Deploy df7bb82                  rick@foobazco.com     2 hours ago

Starting from the oldest release at the bottom:

v73 was a code deploy by a developer named Rick from earlier today. This one ran without problems for about two hours.
v74 was Ned's deploy which was missing a file.
v75 was the rollback, which Ned ran just moments after the bad deploy. This is an exact copy of v73 (more on this in a moment).
v76 was a deploy of the fixed code. Ned was able to spend some time (about ten minutes) double-checking it; the rollback saved him from needing to rush out a fix under pressure.

The top release is always the currently running one, so in this case, we see that v76 is current and was deployed just one minute ago.

More Than Just An Audit Trail

Releases aren't just a history of what happened, they are a fundamental unit in the machinery of the Heroku platform. The transaction history is the primary source of record, meaning the release is the canonical source of information about the code and config used to run your app's processes. This is one way in which the Heroku platform ensures uniformity in the version of code and config used to run all processes in the app's process formation.

Furthermore, releases are an append-only ledger, a concept taken from financial accounting and applied to transactional systems. A mistake in the ledger can't be erased, it can only be corrected by a new entry. This is why heroku rollback creates a new release, which is an exact copy the code and config for a previous release.

We're striving to make releases front-and-center in the output from any command that creates a new release. A release sequence number, such as v10, appears any time you create a new release.

Deploying new code:

$ git push heroku master
...
-----> Launching... done, v10

Setting a config var:

$ heroku config:add FOO=baz
Adding config vars:
  FOO => baz
Restarting app... done, v11

Rolling back:

$ heroku rollback
-----> Rolling back to v10... done, v12

Adding an add-on:

$ heroku addons:add memcache
-----> Adding memcache to strong-fire-938... done, v13 (free)

And as you'd expect, each of the releases above are now listed in heroku releases:

$ heroku releases
Rel   Change                          By                    When
----  ----------------------          ----------            ----------
v13   Add-on add memcache:5mb         kate@foobazco.com     16 seconds ago
v12   Rollback to v10                 kate@foobazco.com     1 minute ago
v11   Config add FOO                  kate@foobazco.com     1 minute ago
v10   Deploy 20a5742                  kate@foobazco.com     11 minutes ago

Releases and rollback are powerful tools for introspection and management. We encourage all developers using Heroku to get familiar with them.

Dev Center: Releases

The Future

Logging, releases, and ps provide a new level of visibility and introspection on the Heroku cloud application platform. But we're not done yet.

We know, for example, that long-term archival, search, and analytics capability for logs is crucial for production apps, particularly at large scale. We're working on improvements to Logplex's storage and retrieval, as well as capabilities for add-on providers to register themselves as log drains, opening up your app to the world of syslog-as-a-service providers and log analytics tools.

More generally, we're striving to increase visibility across the board — both in our current introspection tools, and by creating new tools. Visibility may initially seem like a minor concern, but as we've seen in this post, it has deep implications the Heroku cloud application platform, and for you as a developer managing your production apps.

Video Transcript