Posted by Adam
Visibility and introspection capabilities are critical for managing and debugging real-world applications. But cloud platforms are often lacking when it comes to visibility. The magical black box is great when it "just works," but not so great when your app breaks and you can't look inside the box.
Standard introspection tools used in server-based deployments — such as
tail -f logfile,
iostat — aren't valid in a multi-tenant cloud environment. We need new tools, new approaches. Heroku's new runtime stack, Celadon Cedar, includes three powerful tools for visibility and introspection:
heroku logs, and
heroku releases. This post will examine each.
But first, a story.
The Deploy Gone Wrong (A Parable)
The FooBaz Co development team practices continuous deployment, pushing new code to their production app running on Heroku several times a day. Today, happy-go-lucky coder Ned has done a small refactoring which pulls some duplicated code out into a new library, called
Utils. The unit tests are all green, and the app looks good running under Foreman on his local workstation; so Ned pushes his changes up to production.
$ git push heroku ... -----> Heroku receiving push ... -----> Launching... done, v74 http://foobaz-production.herokuapp.com deployed to Heroku
Ned likes to use the introspection tools at his disposal to check the health of the app following any deploy. He starts by examining the app's running processes with
$ heroku ps Process State Command ------------ ------------------ ------------------------------ web.1 crashed for 4s bundle exec thin start -p $PORT -e.. web.2 crashed for 2s bundle exec thin start -p $PORT -e.. worker.1 crashed for 3s bundle exec rake jobs:work worker.2 crashed for 2s bundle exec rake jobs:work
Disaster! All the processes in the process formation have crashed. Ned has a red alert on his hands — the production site is down.
Adrenaline pumping, Ned now uses the second introspection tool,
$ heroku logs --ps web.1 2011-06-19T08:35:19+00:00 heroku[web.1]: Starting process with command: `bundle exec thin start -p 38180 -e production` 2011-06-19T08:35:21+00:00 app[web.1]: /app/config/initializers/load_system.rb:12:in `<top (required)>': uninitialized constant Utils (NameError) 2011-06-19T08:35:21+00:00 app[web.1]: from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:201:in `block (2 levels) in <class:Engine>' 2011-06-19T08:35:21+00:00 app[web.1]: from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:200:in `each' 2011-06-19T08:35:21+00:00 app[web.1]: from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.0.7.rc1/lib/rails/engine.rb:200:in `block in <class:Engine>' ...
All processes crashed trying to start up, because the
Utils module is missing. Ned forgot to add
lib/utils.rb to Git.
Ned could press forward by adding the file and pushing again. But the wise move here is to roll back to a known good state, and then think about the fix.
$ heroku rollback -----> Rolling back to v73... done, v75
$ heroku ps Process State Command ------------ ------------------ ------------------------------ web.1 up for 1s bundle exec thin start -p $PORT -e.. web.2 up for 1s bundle exec thin start -p $PORT -e.. worker.1 starting for 3s bundle exec rake jobs:work worker.1 starting for 2s bundle exec rake jobs:work
The app's processes are successfully restarting in their previous state. Crisis averted, Ned can now take his time examining what went wrong, and how to fix it, before deploying again.
Ned investigates locally and confirms with
git status that he forgot to add the file. He adds the file to Git and commits, this time double-checking his work.
He pushes to Heroku again, then uses all three introspection techniques to confirm the newly-deployed app is healthy:
$ heroku ps Process State Command ------------ ------------------ ------------------------------ web.1 up for 2s bundle exec thin start -p $PORT -e.. web.2 up for 2s bundle exec thin start -p $PORT -e.. worker.1 up for 1s bundle exec rake jobs:work worker.1 up for 2s bundle exec rake jobs:work $ heroku logs 2011-06-19T08:39:17+00:00 heroku[web.1]: Starting process with command: `bundle exec thin start -p 56320 -e production` 2011-06-19T08:39:19+00:00 app[web.1]: >> Using rack adapter 2011-06-19T08:39:19+00:00 app[web.1]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy) 2011-06-19T08:39:19+00:00 app[web.1]: >> Maximum connections set to 1024 2011-06-19T08:39:19+00:00 app[web.1]: >> Listening on 0.0.0.0:56320, CTRL+C to stop $ heroku releases Rel Change By When ---- ---------------------- ---------- ---------- v76 Deploy d706b4a firstname.lastname@example.org 1 minute ago v75 Rollback to v73 email@example.com 14 minutes ago v74 Deploy 20a5742 firstname.lastname@example.org 15 minutes ago v73 Deploy df7bb82 email@example.com 2 hours ago
Golden. Ned breathes a sigh of relief, and starts composing an email to his team about how they should really think about using a staging app as protection against this kind of problem in the future.
Now that we've seen each of these three visibility tools in action, let's look at each in more depth.
Visibility Tool #1: ps
heroku ps is a spiritual sister to the unix
ps command, and a natural extension of the process model approach to running apps. But where unix's
ps is for a single machine,
heroku ps spans all of the app's processes on the distributed execution environment of the dyno manifold.
A clever trick here is to use the watch command for a realtime display of your app's process status:
$ watch heroku ps Every 2.0s: heroku ps Sun Jun 19 01:44:55 2011 Process State Command ------------ ------------------ ------------------------------ web.1 up for 16h bundle exec rackup -p $PORT -E $RA..
Leave this running in a terminal as you push code, scale the process formation, change config vars, or add add-ons, and you'll get a first-hand look at how the Heroku process manager handles your processes.
Dev Center: ps command
Visibility Tool #2: Logs
In server-based deploys, logs often exist as files on disk, which can lead to us thinking of logs as files (hence "logfiles"). A better conceptual model is:
Logs are a stream of time-ordered events aggregated from the output streams of all the app's running processes, system components, and backing services.
Heroku's Logplex routes log streams from all of these diverse sources into a single channel, providing the foundation for truly comprehensive logging. Heroku aggregates three categories of logs for your app:
- App logs - Output from your app, such as: everything you'd normally expect to see in Rails'
production.log, output from Thin, output from Delayed Job.
- System logs - Messages about actions taken by the Heroku platform infrastructure on behalf of your app, such as: restarting a crashed process, idling or unidling a web dyno, or serving an error page due to a problem in your app.
- API logs - Messages about administrative actions taken by you and other developers working on your app, such as: deploying new code, scaling the process formation, or toggling maintenance mode.
Filtering down to just API logs is a great way to see what you and other collaborators on the app have been up to:
$ heroku logs --source heroku --ps api 2011-06-18T08:21:37+00:00 heroku[api]: Set maintenance mode on by firstname.lastname@example.org 2011-06-18T08:21:39+00:00 heroku[api]: Config add ADMIN_PASSWORD by email@example.com 2011-06-18T08:21:39+00:00 heroku[api]: Release v4 created by firstname.lastname@example.org 2011-06-18T08:21:43+00:00 heroku[api]: Scale to web=4 by email@example.com 2011-06-18T08:22:01+00:00 heroku[api]: Set maintenance mode off by firstname.lastname@example.org
If your teammates are actively performing administrative actions on the app, running the above command with the additional argument of
--tail will let you watch the events unfold as they happen. Add
watch heroku ps and
watch heroku releases in terminals paneled to all be visible at once, and you've got a mini-mission control for your app.
Dev Center: Logging
Visibility Tool #3: Releases
Releases are a history of changes to your app. New releases are created whenever you deploy new code, change config vars, or change add-ons. Each release stores the full state of the code and config, in addition to audit information such as who made the change and a timestamp.
This is the release history following Ned's bad deploy and subsequent recovery:
$ heroku releases Rel Change By When ---- ---------------------- ---------- ---------- v76 Deploy d706b4a email@example.com 1 minute ago v75 Rollback to v73 firstname.lastname@example.org 14 minutes ago v74 Deploy 20a5742 email@example.com 15 minutes ago v73 Deploy df7bb82 firstname.lastname@example.org 2 hours ago
Starting from the oldest release at the bottom:
v73was a code deploy by a developer named Rick from earlier today. This one ran without problems for about two hours.
v74was Ned's deploy which was missing a file.
v75was the rollback, which Ned ran just moments after the bad deploy. This is an exact copy of
v73(more on this in a moment).
v76was a deploy of the fixed code. Ned was able to spend some time (about ten minutes) double-checking it; the rollback saved him from needing to rush out a fix under pressure.
The top release is always the currently running one, so in this case, we see that
v76 is current and was deployed just one minute ago.
More Than Just An Audit Trail
Releases aren't just a history of what happened, they are a fundamental unit in the machinery of the Heroku platform. The transaction history is the primary source of record, meaning the release is the canonical source of information about the code and config used to run your app's processes. This is one way in which the Heroku platform ensures uniformity in the version of code and config used to run all processes in the app's process formation.
Furthermore, releases are an append-only ledger, a concept taken from financial accounting and applied to transactional systems. A mistake in the ledger can't be erased, it can only be corrected by a new entry. This is why
heroku rollback creates a new release, which is an exact copy the code and config for a previous release.
We're striving to make releases front-and-center in the output from any command that creates a new release. A release sequence number, such as
v10, appears any time you create a new release.
Deploying new code:
$ git push heroku master ... -----> Launching... done, v10
Setting a config var:
$ heroku config:add FOO=baz Adding config vars: FOO => baz Restarting app... done, v11
$ heroku rollback -----> Rolling back to v10... done, v12
Adding an add-on:
$ heroku addons:add memcache -----> Adding memcache to strong-fire-938... done, v13 (free)
And as you'd expect, each of the releases above are now listed in
$ heroku releases Rel Change By When ---- ---------------------- ---------- ---------- v13 Add-on add memcache:5mb email@example.com 16 seconds ago v12 Rollback to v10 firstname.lastname@example.org 1 minute ago v11 Config add FOO email@example.com 1 minute ago v10 Deploy 20a5742 firstname.lastname@example.org 11 minutes ago
Releases and rollback are powerful tools for introspection and management. We encourage all developers using Heroku to get familiar with them.
Dev Center: Releases
Logging, releases, and
ps provide a new level of visibility and introspection on the Heroku cloud application platform. But we're not done yet.
We know, for example, that long-term archival, search, and analytics capability for logs is crucial for production apps, particularly at large scale. We're working on improvements to Logplex's storage and retrieval, as well as capabilities for add-on providers to register themselves as log drains, opening up your app to the world of syslog-as-a-service providers and log analytics tools.
More generally, we're striving to increase visibility across the board — both in our current introspection tools, and by creating new tools. Visibility may initially seem like a minor concern, but as we've seen in this post, it has deep implications the Heroku cloud application platform, and for you as a developer managing your production apps.