Visibility and introspection capabilities are critical for managing and debugging real-world applications. But cloud platforms are often lacking when it comes to visibility. The magical black box is great when it "just works," but not so great when your app breaks and you can't look inside the box.
Standard introspection tools used in server-based deployments — such as ssh
, ps aux
, top
, tail -f logfile
, iostat
— aren't valid in a multi-tenant cloud environment. We need new tools, new approaches. Heroku's new runtime stack, Celadon Cedar, includes three powerful tools for visibility and introspection: heroku ps
, heroku logs
, and heroku releases
. This post will examine each.
But first, a story.
The Deploy Gone Wrong (A Parable)
The FooBaz Co development team practices continuous deployment, pushing new code to their production app running on Heroku several times a day. Today, happy-go-lucky coder Ned has done a small refactoring which pulls some duplicated code out into a new library, called Utils
. The unit tests are all green, and the app looks good running under Foreman on his local workstation; so Ned pushes his changes up to production.
$ git push heroku
...
-----> Heroku receiving push
...
-----> Launching... done, v74
http://foobaz-production.herokuapp.com deployed to Heroku
Ned likes to use the introspection tools at his disposal to check the health of the app following any deploy. He starts by examining the app's running processes with heroku ps
:
$ heroku ps
Process State Command
------------ ------------------ ------------------------------
web.1 crashed for 4s bundle exec thin start -p $PORT -e..
web.2 crashed for 2s bundle exec thin start -p $PORT -e..
worker.1 crashed for 3s bundle exec rake jobs:work
worker.2 crashed for 2s bundle exec rake jobs:work
Disaster! All the processes in the process formation have crashed. Ned has a red alert on his hands — the production site is down.
Recovering Quickly
Adrenaline pumping, Ned now uses the second introspection tool, heroku logs
:
$ heroku logs --ps web.1
2011-06-19T08:35:19+00:00 heroku[web.1]: Starting process with command: `bundle exec thin start -p 38180 -e production`
2011-06-19T08:35:21+00:00 app[web.1]: /app/config/initializers/load_system.rb:12:in `<top (required)>': uninitialized constant Utils (NameError)
2011-06-19T08:35:21+00:00 app[web.1]: from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:201:in `block (2 levels) in <class:Engine>'
2011-06-19T08:35:21+00:00 app[web.1]: from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:200:in `each'
2011-06-19T08:35:21+00:00 app[web.1]: from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:200:in `block in <class:Engine>'
...
All processes crashed trying to start up, because the Utils
module is missing. Ned forgot to add lib/utils.rb
to Git.
Ned could press forward by adding the file and pushing again. But the wise move here is to roll back to a known good state, and then think about the fix.
Heroku's third visibility tool, releases, tracks deploys and other changes. It includes a powerful undo capability: rollback. So Ned takes a deep breath, then runs:
$ heroku rollback
-----> Rolling back to v73... done, v75
Ned checks heroku ps
:
$ heroku ps
Process State Command
------------ ------------------ ------------------------------
web.1 up for 1s bundle exec thin start -p $PORT -e..
web.2 up for 1s bundle exec thin start -p $PORT -e..
worker.1 starting for 3s bundle exec rake jobs:work
worker.1 starting for 2s bundle exec rake jobs:work
The app's processes are successfully restarting in their previous state. Crisis averted, Ned can now take his time examining what went wrong, and how to fix it, before deploying again.
The Fix
Ned investigates locally and confirms with git status
that he forgot to add the file. He adds the file to Git and commits, this time double-checking his work.
He pushes to Heroku again, then uses all three introspection techniques to confirm the newly-deployed app is healthy:
$ heroku ps
Process State Command
------------ ------------------ ------------------------------
web.1 up for 2s bundle exec thin start -p $PORT -e..
web.2 up for 2s bundle exec thin start -p $PORT -e..
worker.1 up for 1s bundle exec rake jobs:work
worker.1 up for 2s bundle exec rake jobs:work
$ heroku logs
2011-06-19T08:39:17+00:00 heroku[web.1]: Starting process with command: `bundle exec thin start -p 56320 -e production`
2011-06-19T08:39:19+00:00 app[web.1]: >> Using rack adapter
2011-06-19T08:39:19+00:00 app[web.1]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy)
2011-06-19T08:39:19+00:00 app[web.1]: >> Maximum connections set to 1024
2011-06-19T08:39:19+00:00 app[web.1]: >> Listening on 0.0.0.0:56320, CTRL+C to stop
$ heroku releases
Rel Change By When
---- ---------------------- ---------- ----------
v76 Deploy d706b4a ned@foobazco.com 1 minute ago
v75 Rollback to v73 ned@foobazco.com 14 minutes ago
v74 Deploy 20a5742 ned@foobazco.com 15 minutes ago
v73 Deploy df7bb82 rick@foobazco.com 2 hours ago
Golden. Ned breathes a sigh of relief, and starts composing an email to his team about how they should really think about using a staging app as protection against this kind of problem in the future.
Now that we've seen each of these three visibility tools in action, let's look at each in more depth.
Visibility Tool #1: ps
heroku ps
is a spiritual sister to the unix ps
command, and a natural extension of the process model approach to running apps. But where unix's ps
is for a single machine, heroku ps
spans all of the app's processes on the distributed execution environment of the dyno manifold.
A clever trick here is to use the watch command for a realtime display of your app's process status:
$ watch heroku ps
Every 2.0s: heroku ps Sun Jun 19 01:44:55 2011
Process State Command
------------ ------------------ ------------------------------
web.1 up for 16h bundle exec rackup -p $PORT -E $RA..
Leave this running in a terminal as you push code, scale the process formation, change config vars, or add add-ons, and you'll get a first-hand look at how the Heroku process manager handles your processes.
Dev Center: ps command
Visibility Tool #2: Logs
In server-based deploys, logs often exist as files on disk, which can lead to us thinking of logs as files (hence "logfiles"). A better conceptual model is:
Logs are a stream of time-ordered events aggregated from the output streams of all the app's running processes, system components, and backing services.
Heroku's Logplex routes log streams from all of these diverse sources into a single channel, providing the foundation for truly comprehensive logging. Heroku aggregates three categories of logs for your app:
- App logs - Output from your app, such as: everything you'd normally expect to see in Rails'
production.log
, output from Thin, output from Delayed Job. - System logs - Messages about actions taken by the Heroku platform infrastructure on behalf of your app, such as: restarting a crashed process, idling or unidling a web dyno, or serving an error page due to a problem in your app.
- API logs - Messages about administrative actions taken by you and other developers working on your app, such as: deploying new code, scaling the process formation, or toggling maintenance mode.
Filtering down to just API logs is a great way to see what you and other collaborators on the app have been up to:
$ heroku logs --source heroku --ps api
2011-06-18T08:21:37+00:00 heroku[api]: Set maintenance mode on by kate@foobazco.com
2011-06-18T08:21:39+00:00 heroku[api]: Config add ADMIN_PASSWORD by kate@foobazco.com
2011-06-18T08:21:39+00:00 heroku[api]: Release v4 created by kate@foobazco.com
2011-06-18T08:21:43+00:00 heroku[api]: Scale to web=4 by kate@foobazco.com
2011-06-18T08:22:01+00:00 heroku[api]: Set maintenance mode off by kate@foobazco.com
If your teammates are actively performing administrative actions on the app, running the above command with the additional argument of --tail
will let you watch the events unfold as they happen. Add watch heroku ps
and watch heroku releases
in terminals paneled to all be visible at once, and you've got a mini-mission control for your app.
Dev Center: Logging
Visibility Tool #3: Releases
Releases are a history of changes to your app. New releases are created whenever you deploy new code, change config vars, or change add-ons. Each release stores the full state of the code and config, in addition to audit information such as who made the change and a timestamp.
This is the release history following Ned's bad deploy and subsequent recovery:
$ heroku releases
Rel Change By When
---- ---------------------- ---------- ----------
v76 Deploy d706b4a ned@foobazco.com 1 minute ago
v75 Rollback to v73 ned@foobazco.com 14 minutes ago
v74 Deploy 20a5742 ned@foobazco.com 15 minutes ago
v73 Deploy df7bb82 rick@foobazco.com 2 hours ago
Starting from the oldest release at the bottom:
v73
was a code deploy by a developer named Rick from earlier today. This one ran without problems for about two hours.v74
was Ned's deploy which was missing a file.v75
was the rollback, which Ned ran just moments after the bad deploy. This is an exact copy ofv73
(more on this in a moment).v76
was a deploy of the fixed code. Ned was able to spend some time (about ten minutes) double-checking it; the rollback saved him from needing to rush out a fix under pressure.
The top release is always the currently running one, so in this case, we see that v76
is current and was deployed just one minute ago.
More Than Just An Audit Trail
Releases aren't just a history of what happened, they are a fundamental unit in the machinery of the Heroku platform. The transaction history is the primary source of record, meaning the release is the canonical source of information about the code and config used to run your app's processes. This is one way in which the Heroku platform ensures uniformity in the version of code and config used to run all processes in the app's process formation.
Furthermore, releases are an append-only ledger, a concept taken from financial accounting and applied to transactional systems. A mistake in the ledger can't be erased, it can only be corrected by a new entry. This is why heroku rollback
creates a new release, which is an exact copy the code and config for a previous release.
We're striving to make releases front-and-center in the output from any command that creates a new release. A release sequence number, such as v10
, appears any time you create a new release.
Deploying new code:
$ git push heroku master
...
-----> Launching... done, v10
Setting a config var:
$ heroku config:add FOO=baz
Adding config vars:
FOO => baz
Restarting app... done, v11
Rolling back:
$ heroku rollback
-----> Rolling back to v10... done, v12
Adding an add-on:
$ heroku addons:add memcache
-----> Adding memcache to strong-fire-938... done, v13 (free)
And as you'd expect, each of the releases above are now listed in heroku releases
:
$ heroku releases
Rel Change By When
---- ---------------------- ---------- ----------
v13 Add-on add memcache:5mb kate@foobazco.com 16 seconds ago
v12 Rollback to v10 kate@foobazco.com 1 minute ago
v11 Config add FOO kate@foobazco.com 1 minute ago
v10 Deploy 20a5742 kate@foobazco.com 11 minutes ago
Releases and rollback are powerful tools for introspection and management. We encourage all developers using Heroku to get familiar with them.
Dev Center: Releases
The Future
Logging, releases, and ps
provide a new level of visibility and introspection on the Heroku cloud application platform. But we're not done yet.
We know, for example, that long-term archival, search, and analytics capability for logs is crucial for production apps, particularly at large scale. We're working on improvements to Logplex's storage and retrieval, as well as capabilities for add-on providers to register themselves as log drains, opening up your app to the world of syslog-as-a-service providers and log analytics tools.
More generally, we're striving to increase visibility across the board — both in our current introspection tools, and by creating new tools. Visibility may initially seem like a minor concern, but as we've seen in this post, it has deep implications the Heroku cloud application platform, and for you as a developer managing your production apps.