Heroku XL: Focusing on Large Scale Apps

Having a web or mobile app become hugely popular is one of those "good problems" to have. But success is still its own challenge - making any architecture work at high volume can often create a unique kind of complexity. And as the Internet grows, and apps become more prevalent, its an increasingly common requirement.

The largest app on Heroku routinely exceeds 10,000 requests / second, and two of the top 50 sites on the Internet (as measured by Quantcast) - Urban Dictionary and Upworthy - run on Heroku. Across all apps, Heroku is now serving over 5 billion requests per day (or about 60,000 requests per second).

Heroku has always been guided by the goal of making it as easy as possible to build and scale apps in the cloud, and we want to extend those same benefits to the largest "XL" app developers. Working with these and other customers, we’ve seen a consistent pattern of requirements from high scale apps, which we are announcing today as a set of features and services to help make the pattern of XL apps simpler and more easily repeatable.

Performance Dynos

In order to run and operate large scale apps, Heroku has made the most significant redesign of the dyno since it was first introduced. The result is the new Performance Dyno, launching today.

Performance Dynos are highly isolated from other dynos, providing a high and consistent quality of service. They have 12 times the memory of a 1X dyno, and significantly more compute resources. The result is that apps running on performance dynos can have faster and more consistent response times, particularly for their perc99 latencies.

The design of Performance Dynos was driven by the requests of our largest customers for how they want to deploy their apps. In our most popular languages and frameworks, the trend in deployments on Heroku is towards heavy in-dyno concurrency. High worker utilization in Unicorn and Gunicorn, or multi-threaded environments such as Puma (with Rubinius or jRuby) or Node Cluster benefit from vertical scaling - more cores, more I/O, and more performance.

Under the covers, Performance Dynos occupy the same LXC containers as 1X and 2X dynos. This means that applications can migrate to them in seconds, and they still enjoy the ease-of-use and instant scaling that you expect from Heroku. Unlike traditional dynos, the LXC container for Performance Dynos occupies an entire virtual compute instance (as of the time of this post, it is an AWS c1.xlarge). This provides the instance with extremely high isolation from the loads of other dynos and apps running on our platform. As a result, apps using Performance Dynos can achieve consistent, predictable performance.

Performance Dynos are available immediately in Heroku's US region for $0.80 / hour and can be provisioned via the Heroku dashboard or Heroku CLI.

This chart outlines the specs for Performance Dynos using a 1X Dyno for reference:

	1X Dyno	Performance Dyno
RAM	512 MB	6 GB
Compute¹	1x - 4x ²	40x (8 CPU cores)
Price	$0.05 / hour	$0.80 / hour

¹ Overall performance will vary heavily based on app implementation.
² 1X Dyno performance will vary based on available system resources

Measuring Resource Utilization with runtime and postgres metrics

How do you know if you need Performance Dynos? The first step is to better understand the resource consumption for your app. We have a tool for doing just this. Runtime metrics emit the load and memory usage for each dyno to application log streams:

source=web.1 dyno=heroku.2808254.d97d0ea7-cf3d-411b-b453-d2943a50b456 sample#load_avg_1m=2.46 sample#load_avg_5m=1.06 sample#load_avg_15m=0.99    
source=web.1 dyno=heroku.2808254.d97d0ea7-cf3d-411b-b453-d2943a50b456 sample#memory_total=21.00MB sample#memory_rss=21.22MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=348836pages sample#memory_pgpgout=343403pages

In general, you can maximize the performance of an app by increasing its concurrency (threads or processes) until it is using most of its available memory, so long as its load is less than the number of CPU cores available to it. The ideal settings vary from app to app. To help you through the process of maximizing your application performance, we have created this dyno optimization guide.

In addition to runtime metrics, you can now also measure the resource utilization of your Heroku Postgres database (available for standard tiers and above). Just like runtime metrics, Postgres metrics are emitted to application log streams. Postgres metrics include:

Index hit rate
Cache hit rate
Database size
Load
Memory Usage
I/O Operations

(Load, memory, and I/O are only available on some plans, see the docs for details.)

And if you want to view trending for these resource metrics, you can do so with our partner Librato. Just install their add-on on the Nickel plan or above. Runtime metrics are currently in beta, and we appreciate any feedback you have on how we can improve them.

Supporting At-Scale Apps

When apps achieve high scale, they are being used around the clock. Companies who run these apps need vendors who can support them 24/7. To this end, we are announcing pricing for our premium support plans.

All Heroku apps include standard business-hour support that is triaged on a basis of issue severity, and although we work hard to be responsive there is no guaranteed turnaround time. If your business requires a higher support level, it is now available through our Premium Support Tier. It provides 24/7 support with a 1-hour SLA for critical tickets (most tickets are answered in less than 10 minutes).

Premium Support: 24/7 Support, 1-Hour SLA for Critical Tickets. $1,000 / mo or 20% of account spend (whichever is greater).
Technical Account Management: All of the features of Premium Support, plus technical consultation with a dedicated support engineer. Technical Account Management is $1,000 / mo in addition to the price of Premium Support.

Pricing and details for Heroku’s support tiers are available on our pricing page.

Understanding Perc99 and Tail Latencies

The basic measurement of web application performance is response times - the time between a client’s request and the app’s response. Faster response times are better.

The most common way of tracking response times is to take the average and track its trending over time. But averages only tell part of the story. Consider an app that has two methods, one of which has a response time of 10 ms, and the other with a response time of 1,000 ms. If the fast method is called 90% of the time, then the average response time for the app would be a respectable 109 ms. But this average disguises the fact that one part of the app is very fast while another is very slow.

Another approach is to measure the slowest (i.e. maximum) response times. Again, this can be ineffective, as a single slow response will skew the results. In the above example, if the app had a third method that responded in 30,000 ms that was called less than 1% of the time, the maximum response time would be 30,000 ms - clearly not representative of how most users experienced the app.

The "goldilocks" solution is to measure the 99th percentile response times (i.e. perc99). Perc99 is the time which is slower than 99% of requests, but faster than 1%. By definition, they account for the majority of an app’s performance (99% of it), without being susceptible to extreme outliers. Continuing our example, the perc99 would be 1,000 ms - which is a reasonable measure of the upper end of the response time that most users on the app would experience.

For more information on tail latencies, see The Tail at Scale from Research at Google.

What is Perc99 Graphic

When a well-optimized app is under light to moderate loads, the perc99 response time is not significantly larger than average response times (in most cases). However as the traffic hits significant loads (over 1,000 requests per second) the perc99 response times will begin to grow much more quickly than the average.

This is precisely what Performance Dynos were designed to solve. They allow dynos to have more concurrency and execute more consistently, which results in perc99’s being reduced substantially. Our beta customers have seen perc99’s reduced by 80-90% by switching to performance dynos.

Measuring perc99 response times has previously been difficult. However by working with two of our visibility partners, it is now as easy as provisioning an add-on on your Heroku app. New Relic now measures and displays perc99 and perc95 response times as well as histograms of app performance (just make sure your app has the latest library installed). Librato also measures perc95 and perc99 response times. It uses Heroku router logs from your app, so there is no client library to install.

Summary

As the world’s digital consumption continues to grow, it will be increasingly common for apps to hit high scales quickly. With the tools introduced today, you can be confident that when your app is ready to scale, Heroku will scale with it.

If your app’s usage is starting to ramp up, or may be soon, you can get in touch with us to evaluate your app and discuss your scaling plans.

Video Transcript