Routing and Web Performance on Heroku: a FAQ

Hi. I'm Adam Wiggins, cofounder and CTO of Heroku.

Heroku has been my life’s work. Millions of apps depend on us, and I take that responsibility very personally.

Recently, Heroku has faced criticism from the hacker community about how our HTTP router works, and about web performance on the platform in general. I’ve read all the public discussions, and have spent a lot of time over the past month talking with our customers about this subject.

The concerns I've heard from you span past, present, and future.

The past: some customers have hit serious problems with poor web performance and insufficient visibility on their apps, and have been left very frustrated as a result. What happened here? The present: how do you know if your app is affected, and if so what should you do? And the future: what is Heroku doing about this? Is Heroku a good place to run and scale an app over the long term?

To answer these questions, we’ve written a FAQ, found below. It covers what happened, why the router works the way that it does, whether your app is affected by excessive queue time, and what the solution is.

As to the future, here’s what we’re doing. We’re ramping up hands-on migration assistance for all users running on our older stack, Bamboo, or running a non-concurrent back-end on our new stack, Cedar. (See the FAQ for why this is the fix.) We’re adding new features such as 2X dynos to make it easier to run concurrent back-ends for large Rails apps. And we're making performance and visibility a bigger area of product attention, starting with some tools we've already released in the last month.

If you have a question not answered by this FAQ, post it as a comment here, on Hacker News, or on Twitter. I’ll attempt to answer all such questions posted in the next 24 hours.

To all our customers who experienced real pain from this: we're truly sorry. After reading this FAQ, I hope you feel we're taking every reasonable step to set things right, but if not, please let us know.



Q. Is Heroku’s router broken?

A. No. While hundreds of pages could be written on this topic, we’ll address some of this in Routing technology. Summary: the current version of the router was designed to provide the optimum combination of uptime, throughput, and support for modern concurrent back-ends. It works as designed.

Q. So what’s this whole thing about then?

A. Since early 2011, high-volume Rails apps that run on Heroku and use single-threaded web servers sometimes experienced severe tail latencies and poor utilization of web back-ends (dynos). Lack of visibility into app performance, including incorrect queue time reporting prior to the New Relic update in February 2013, made diagnosing these latencies (by customers, and even by Heroku’s own support team) very difficult.

Q. What types of apps are affected?

A. Rails apps running on Thin, with six or more dynos, and serving 1k reqs/min or more are the most likely to be affected. The impact becomes more pronounced as such apps use more dynos, serve more traffic, or have large request time variances.

Q. How can I tell if my app is affected?

A. Add the free version of New Relic (heroku addons:add newrelic) and install the latest version of the newrelic_rpm gem, then watch your queue time. Average queue times above 40ms are usually indicative of a problem.

Some apps with lower request volume may be affected if they have extremely high request time variances (e.g., HTTP requests lasting 10+ seconds) or make callbacks like this OAuth example.

Q. What’s the fix?

A. Switch to a concurrent web back-end like Unicorn or Puma on JRuby, which allows the dyno to manage its own request queue and avoid blocking on long requests.

This requires that your app be on our most current stack, Cedar.

Q. Can you give me some help with this?

A. Certainly. We’ve already emailed all customers with apps running on Thin with more than six dynos with self-migration instructions, and a way to reach us for direct assistance.

If you haven’t received the email and want help making the switch, contact us for migrating to Cedar or migrating to Unicorn.

Routing technology

Q. Why does the router work the way that it does?

A. The Cedar router was built with two goals in mind: (1) to support the new world of concurrent web back-ends which have become the standard in Ruby and all other language communities; and (2) to handle the throughput and availability needs of high-traffic apps.

Read detailed documentation of Heroku’s HTTP routing.

Q. Even with concurrent web back-ends, wouldn’t a single global request queue still use web dynos more efficiently?

A. Probably, but it comes with trade-offs for availability and performance. The Heroku router favors availability, stateless horizontal scaling, and low latency through individual routing nodes. Per-app global request queues require a sacrifice on one or more of these fronts. See Kyle Kingsbury’s post on the CAP theorem implications for global request queueing.

After extensive research and experimentation, we have yet to find either a theoretical model or a practical implementation that beats the simplicity and robustness of random routing to web back-ends that can support multiple concurrent connections.

Q. So does that mean you aren’t working on improving HTTP performance?

A. Not at all. We're always looking for new ways to make HTTP requests on Heroku faster, more reliable, and more efficient. For example, we’ve been experimenting with backpressure routing for web dynos to signal to the router that they are overloaded.

You, our customers, have told us that it’s not routing algorithms you ultimately care about, but rather overall web performance. You want to serve HTTP requests as quickly as possible, for fast page loads or API calls for your users. And you want to be able to quickly and easily diagnose performance problems.

Performance and visibility are what matters, and that’s what we’ll work on. This will include ongoing improvements to dynos, the router, visibility tools, and our docs.


Q. Did the Bamboo router degrade?

A. Yes. Our older router was built and designed during the early years of Heroku to support the Aspen and later the Bamboo stack. These stacks did not support concurrent back-ends, and thus the router was designed with a per-app global request queue. This worked as designed originally, but then degraded slowly over the course of the next two years.

Q. Were the docs wrong?

A. Yes, for Bamboo. They were correct when written, but fell out of date starting in early 2011. Until February 2013, the documentation described the Bamboo router only sending one connection at a time to any given web dyno.

Q. Why didn’t you update Bamboo docs in 2011?

A. At the time, our entire product and engineering team was focused on our new product, Cedar. Being so focused on the future meant that we slipped on stewardship of our existing product.

Q. Was the "How It Works" section of the Heroku website wrong?

A. Yes. Similar to the docs, How It Works section of our website described the router as tracking which dynos were tied up by long HTTP requests. This was accurate when written, but gradually fell out of date in early 2011. Unlike the docs, we completely rewrote the homepage in June of 2011 and it no longer referenced tracking of long requests.

Q. Was the queue time metric in New Relic wrong?

A. Yes, for the same 2011—2013 period from previous questions. The metric was transmitted to the New Relic instrumentation in the app via a set of HTTP headers set by the Heroku router. The root cause was the same as the Bamboo router degradation: the code didn't change, but scaling out the router nodes caused the data to become increasingly inaccurate and eventually useless. With New Relic's help, we fixed this in February 2013 by calculating queue time using a different method.

Q. Why didn’t Heroku take action on this until Rap Genius went public?

A. We’re sorry that we didn’t take action on this based on the customer complaints via support tickets and other channels sooner. We didn’t understand the magnitude of the confusion and frustration caused by the out-of-date Bamboo docs, incorrect queue time information in New Relic, and the general lack of visibility into web performance on the platform. The huge response to the Rap Genius post showed us that this touched a nerve in our community.

The Future

Q. What are we doing to make things right from here forward?

A. We’ve been working with many of our customers to get their queue times down, get them accurate visibility into their app’s performance, and make sure their app is fast and running on the right number of dynos. So far, the results are good.

Q. What about everyone else?

A. If we haven’t been in touch yet, here’s what we’re doing for you:

  • Migration assistance: We’ll give you hands-on help migrating to a concurrent back-end, either individually or in online workshops. This includes the move to Cedar if you’re still on Bamboo. If you’re running a multi-dyno app on a non-concurrent back-end and haven’t received an email, drop us a line about Thin to Unicorn or Bamboo to Cedar.
  • 2X dynos: We’re fast-tracking the launch of 2X dynos, to provide double the memory and allow for double (or more) Unicorn concurrency for large Rails apps. This is already available in private beta in use by several hundred customers, and will be available in public beta shortly.
  • New visibility tools: We’re putting more focus on bringing you new performance visibility features, such as the log2viz dashboard, CPU and memory use logging, and HTTP request IDs. We’ll be working to do much more on this front to make sure that you can diagnose performance problems when they happen and know what to do about it.

Want something else not mentioned here? Let us know.

Helios - open source framework for mobile

Heroku has a strong tradition with open source projects. Engineers have dedicated countless hours to the projects that developers count on every day. Open Source Software is in our DNA.

Speaking personally, I’m passionate about building tools like AFNetworking and cupertino, in order to help developers build insanely great experiences for mobile devices. It’s with great pleasure that I introduce something new I’ve been working on:

Helios is an open-source framework that provides essential back-end services for iOS apps. This includes data synchronization, push notifications, in-app purchases, and passbook integration. It allows developers to get a client-server app up-and-running while seamlessly incorporating functionality as necessary.

Helios is designed for "mobile first" development. Build out great features on the device, and implement the server-side components as necessary. Pour all of your energy into crafting a great user experience, rather than getting mired down with the back-end.

Built on the Rack webserver interface, Helios can be easily added into any existing Rails or Sinatra application. Or, if you're starting with a Helios application, you can build a new Rails or Sinatra application on top of it. This means that you can develop your application using the tools and frameworks you love, and maintain flexibility with your architecture as your needs evolve.

Give it a try and let me know what you think!

Waza 2013: How Ecosystems Build Mastery

When we think of the concept of Waza (技) or "art and technique," it's easy to get caught up in the idea of individual mastery. It's true that works of art are often created by those with great skill, but acquiring that skill is neither solitary nor static. Generations of masters contribute to a canon and it is in that spirit that we built the Heroku platform and the Waza event. This year's Waza was no exception.

On February 28th, more than 900 attendees participated in Waza including Ruby founder Yukihiro "Matz" Matsumoto, Django co-creator Jacob Kaplan-Moss and Codeacademy’s Linda Liukas. True to form, we offered you a platform for experimentation and you surprised us with your contributions.

From your origami creations, to your Arduino hacks, to the technical conversations over craft beer -- you taught us that the definition of software development is ever-evolving. Thank you for allowing us to help you change lives and push boundaries. We will continue our commitment to growing the platform for you and look forward to collaborating with you in the future.

For more event highlights visit the Waza videos and photos. To learn more about Heroku, add yourself to our mailing list.

log2viz: Logs as Data for Performance Visibility


If you’re building a customer-facing web app or mobile back-end, performance is a critical part of user experience. Fast is a feature, and affects everything from conversion rates to your site’s search ranking.

The first step in performance tuning is getting visibility into the app’s web performance in production. For this, we turn to the app’s logs.

Logs as data

There are many ways to collect metrics, the most common being direct instrumentation into the app. New Relic, Librato, and Hosted Graphite are cloud services that use this approach, and there are numerous roll-your-own options like StatsD and Metrics.

Another approach is to send metrics to the logs. Beginning with the idea that logs are event streams, we can use logs for a holistic view of the app: your code, and the infrastructure that surrounds it (such as the Heroku router). Mark McGranaghan’s Logs as Data and Ryan Daigle’s 5 Steps to Better Application Logging offer an overview of the logs-as-data approach.

Put simply, logs as data means writing semi-structured data to your app's logs via STDOUT. Then the logs can be consumed by one or more services to do dashboards, long-term trending, and threshold alerting.

The benefits of logs-as-data over direct instrumentation include:

  • No additional library dependencies for your app
  • No CPU cost to your dyno by in-app instrumentation
  • Introspection capability by reading the logs directly
  • Metrics back-ends can be swapped out without changes to app code
  • Possible to split the log stream and send it to multiple back-ends, for different views and alerting on the same data

Introducing log2viz, a public experiment

log2viz is an open-source demonstration of the logs-as-data concept for Heroku apps. Log in and select one of your apps to see a live-updating dashboard of its web activity.

For example, here’s a screenshot of log2viz running against the Rubygems Bundler API (written and maintained by Terence Lee, André Arko, and Larry Marburger, and running on Heroku):


log2viz gets all of its data from the Heroku log stream — the same data you see when running heroku logs --tail at the command line. It requires no changes to your app code and works for apps written in any language and web framework, demonstrating some of the benefits of logs as data.

Also introducing: log-runtime-metrics

In order to get memory use stats for your dynos, we’ve added a new experimental feature to Heroku Labs to log CPU and memory use by the dyno: log-runtime-metrics.

To enable this for your app (and see memory stats in log2viz), type the following:

$ heroku labs:enable log-runtime-metrics -a myapp
$ heroku restart

This inserts data into your logs like this:

heroku[web.1]: measure=load_avg_5m val=0.0
heroku[web.1]: measure=memory_total val=209.64 units=MB

log2viz reads these stats and displays average and max memory use across your dynos. (Like all Labs features, this is experimental and the format may change in the future.)

Looking under the hood

log2viz is open source. Let’s look at the code:

You can deploy your own copy of log2viz on Heroku, so fork away! For example, Heroku customer Timehop has experimented with trending graphs via Rickshaw.

Logs-as-data add-ons

log2viz isn't the only way to take advantage of your log stream for visibility on Heroku today. Here are a few add-ons which consume your app's logs.

Loggly offers a web console that lets you search your log history, and graph event types over time. For example, let’s search for status=404, logged by the Heroku router whenever your app serves a page not found:

404 Status

Papertrail offers search archival and history, and can also alert on events when they pass a certain threshold. Here’s how you can set up an email alert every time your app experiences more than 10 H12 errors in a 60 second period. Search for the router log line:

H12 Errors

Click “Save Search,” then:

Other add-ons that consume logs include Treasure Data and Logentries.

You can also use non-add-on cloud services, as shown in thoughtbot's writeup on using Splunk Storm with Heroku.


Visibility is a vast and challenging problem space. The logs-as-data approach is still young, and log2viz is just an experiment to get us started. We look forward to your feedback on log2viz, log visibility via add-ons, and your own experiments on performance visibility.

Running Rails on Heroku Update

On February 16th, we published a blog post outlining five specific and immediate actions we would take to improve our Rails customers' experience with Heroku. We want to provide you with an update on where these things stand. As a reminder, here’s what we committed to do:

  1. Improve our documentation so that it accurately reflects how our service works across both Bamboo and Cedar stacks
  2. Remove incorrect and confusing metrics reported by Heroku or partner services like New Relic
  3. Add metrics that let customers determine queuing impact on application response times
  4. Provide additional tools that developers can use to augment our latency and queuing metrics
  5. Work to better support concurrent-request Rails apps on Cedar

We have resolved the first two items:

  1. Improving Documentation: We’ve updated our Dev Center docs and website to more accurately describe how routing occurs on Heroku (e.g., HTTP Routing on Bamboo and How Routing Works on the Bamboo Stack).

  2. Removing Incorrect Metrics: We’ve worked with New Relic to release an updated version of their monitoring tools, which we documented in Better Queuing Metrics With Updated New Relic Add-on.

We are working on the remaining three items in partnership with our early beta users:

  1. Adding New Metrics: We are improving the data available through our logs in two ways. First, we’re adding a transaction ID to every request log. Second, we are working on application middleware to provide additional metrics in customer logs.
  2. Providing Additional Measurement Tools: We have created a data visualization tool that analyzes log data in real-time and helps observe an app’s performance across a few key Heroku-specific metrics. This tool is currently being beta tested.
  3. Improving Cedar Support: Unicorn is now the recommended Rails app server. We are currently beta testing larger dynos to support additional unicorn processes.

Fulfilling these commitments is Heroku’s number one priority. We have dedicated engineering and product teams focused on improving the performance of Rails apps on Heroku.

We’ll detail the availability of our improvements in future blog posts. If you would like to beta test any of these new features, please drop us a note and let us know which features you are interested in testing.

Browse the blog archives, subscribe to the full-text feed, or visit the engineering blog.