Over the past decade, millions of developers have interacted with the Heroku CLI. In those 10 years, the CLI has gone through many changes. We've changed languages several times; redesigned the plugin architecture; and improved test coverage and the test framework. What follows is the story of our team's journey to build and maintain the Heroku CLI from the early days of Heroku to today.

  1. Ruby (CLI v1-v3)
  2. Go/Node (CLI v4)
  3. Go/Node (CLI v5)
  4. Pure Node (CLI v6)
  5. What's Next?

Ruby (CLI v1-v3)

Our original CLI (v1-v3) was written in Ruby and served us well for many years. Ruby is a great, expressive language for building CLIs, however, we started experiencing enough problems that...

The Heroku Connect team ran into problems with existing task scheduling libraries. Because of that, we wrote RedBeat, a Celery Beat scheduler that stores scheduled tasks and runtime metadata in Redis. We’ve also open sourced it so others can use it. Here is the story of why and how we created RedBeat.


Heroku Connect, makes heavy use of Celery to synchronize data between Salesforce and Heroku Postgres. Over time, our usage has grown, and we came to rely more and more heavily on the Beat scheduler to trigger frequent periodic tasks. For a while, everything was running smoothly, but as we grew cracks started to appear. Beat, the default Celery scheduler, began to behave...

Back on August 11, 2016, Heroku experienced increased routing latency in the EU region of the common runtime. While the official follow-up report describes what happened and what we've done to avoid this in the future, we found the root cause to be puzzling enough to require a deep dive into Linux networking.

The following is a write-up by SRE member Lex Neva (what's SRE?) and routing engineer Fred Hebert (now Heroku alumni) of an interesting Linux networking "gotcha" they discovered while working on incident 930.

The Incident

Our monitoring systems paged us about a rise in latency levels across the board in the EU region of the Common Runtime. We quickly saw that the...

As part of our commitment to security and support, we periodically upgrade the stack image, so that we can install updated package versions, address security vulnerabilities, and add new packages to the stack. Recently we had an incident during which some applications running on the Cedar-14 stack image experienced higher than normal rates of segmentation faults and other “hard” crashes for about five hours. Our engineers tracked down the cause of the error to corrupted dyno filesystems caused by a failed stack upgrade. The sequence of events leading up to this failure, and the technical details of the failure, are unique, and worth exploring.


Heroku runs application...

At Heroku, we're always working towards improving operational stability with the services we offer. As we recently launched Apache Kafka on Heroku, we've been increasingly focused on hardening Apache Kafka, as well as our automation around it. This particular improvement in stability concerns Kafka's compacted topics, which we haven't talked about before. Compacted topics are a powerful and important feature of Kafka, and as of 0.9, provide the capabilities supporting a number of important features.

Meet the Bug

The bug we had been seeing is that an internal thread that's used by Kafka to implement compacted topics (which we'll explain more of shortly) can die in...

Browse the archives for engineering or all blogs Subscribe to the RSS feed for engineering or all blogs.