Pushing an app to the cloud can feel like launching a probe into space. Once your project is thousands of miles away you can't bang on it with a hammer or replace broken parts when there's a problem. Your debugging efforts must rely on the instrumentation, telemetry, and remote controls included with the app when it was deployed. On Heroku, we've gladly done some of that prep work for you.

Two new Heroku features, Heroku Exec and Language Runtime Metrics, improve your production monitoring, inspecting, and debugging experience on the platform. With Heroku Exec, you can create secure TCP and SSH tunnels into a dyno, which facilitate SSH sessions, port forwarding, remote...


Over the past decade, millions of developers have interacted with the Heroku CLI. In those 10 years, the CLI has gone through many changes. We've changed languages several times; redesigned the plugin architecture; and improved test coverage and the test framework. What follows is the story of our team's journey to build and maintain the Heroku CLI from the early days of Heroku to today.

  1. Ruby (CLI v1-v3)
  2. Go/Node (CLI v4)
  3. Go/Node (CLI v5)
  4. Pure Node (CLI v6)
  5. What's Next?

Ruby (CLI v1-v3)

Our original CLI (v1-v3) was written in Ruby and served us well for many years. Ruby is a great, expressive language for building CLIs, however, we started experiencing enough problems that...


The Heroku Connect team ran into problems with existing task scheduling libraries. Because of that, we wrote RedBeat, a Celery Beat scheduler that stores scheduled tasks and runtime metadata in Redis. We’ve also open sourced it so others can use it. Here is the story of why and how we created RedBeat.

Background

Heroku Connect, makes heavy use of Celery to synchronize data between Salesforce and Heroku Postgres. Over time, our usage has grown, and we came to rely more and more heavily on the Beat scheduler to trigger frequent periodic tasks. For a while, everything was running smoothly, but as we grew cracks started to appear. Beat, the default Celery scheduler, began to behave...


Back on August 11, 2016, Heroku experienced increased routing latency in the EU region of the common runtime. While the official follow-up report describes what happened and what we've done to avoid this in the future, we found the root cause to be puzzling enough to require a deep dive into Linux networking.

The following is a write-up by SRE member Lex Neva (what's SRE?) and routing engineer Fred Hebert (now Heroku alumni) of an interesting Linux networking "gotcha" they discovered while working on incident 930.

The Incident

Our monitoring systems paged us about a rise in latency levels across the board in the EU region of the Common Runtime. We quickly saw that the...


As part of our commitment to security and support, we periodically upgrade the stack image, so that we can install updated package versions, address security vulnerabilities, and add new packages to the stack. Recently we had an incident during which some applications running on the Cedar-14 stack image experienced higher than normal rates of segmentation faults and other “hard” crashes for about five hours. Our engineers tracked down the cause of the error to corrupted dyno filesystems caused by a failed stack upgrade. The sequence of events leading up to this failure, and the technical details of the failure, are unique, and worth exploring.

Background

Heroku runs application...


Browse the archives for engineering or all blogs Subscribe to the RSS feed for engineering or all blogs.