|||

Video Transcript

X

Hello RedBeat: A Celery Beat Scheduler

The Heroku Connect team ran into problems with existing task scheduling libraries. Because of that, we wrote RedBeat, a Celery Beat scheduler that stores scheduled tasks and runtime metadata in Redis. We’ve also open sourced it so others can use it. Here is the story of why and how we created RedBeat.

Background

Heroku Connect, makes heavy use of Celery to synchronize data between Salesforce and Heroku Postgres. Over time, our usage has grown, and we came to rely more and more heavily on the Beat scheduler to trigger frequent periodic tasks. For a while, everything was running smoothly, but as we grew cracks started to appear. Beat, the default Celery scheduler, began to behave erratically, with intermittent pauses (yellow in the chart below) and occasionally hanging (red in the chart below). Hangs would require manual intervention, which led to an increased pager burden.

redbeat-before

Out of the box, Beat uses a file-based persistent scheduler, which can be problematic in a cloud environment where you can’t guarantee Beat will restart with access to the same filesystem. Of course, there are ways to solve this, but it requires introducing more moving parts to manage a distributed filesystem. An immediate solution is to use your existing SQL database to store the schedule and django-celery, which we were using, allows you to do this easily.

After digging into the code, we discovered the hangs were due to blocked transactions in the database and the long pauses were caused by periodic saving and reloading of the schedule. We could mitigate this issue by increasing the time between saves, but this also increases the likelihood that we'd lose data. In the end, it was evident that django-celery was a poor fit for this pattern of frequent schedule updates.

We were already using Redis as our Celery broker, so we decided to investigate moving the schedule into Redis as well. There is an existing celerybeatredis package, but it suffers from the same design issues as django-celery, requiring a pause and full reload to pick up changes.

So we decided to create a new package, RedBeat, which takes advantage of the inherent strengths of Redis. We’ve been running it in production for over a year and have not seen any recurrences of the problems we were having with the django-celery based scheduler.

The RedBeat Difference

How is RedBeat different? The biggest change is that the active schedule is stored in Redis rather than within process space of the Celery Beat daemon.

No longer does creating or modifying a task require Beat to pause and reload, we just update a key in Redis and Beat will pick up the change on the next tick. A nice side-effect of this is it’s trivial to make updates to the schedule from other languages. As with django-celery, we no longer need to worry about sharing a file across multiple machines to preserve metadata about when tasks were last run. Startup and shutdown times improved since we don't suffer from load spikes caused by having to save and reload the entire schedule from the database. Rather, we have a steady, predictable load on Redis.

Finally, we added a simple lock that prevents multiple Beat daemons from running concurrently. This can sometimes be a problem for Heroku customers when they scale up from a single worker or during development.

After converting to RedBeat, we’ve had no scheduler related incidents.

redbeat-after

Needless to say, so far we’ve been happy with RedBeat and hope others will find it useful too.

Why not take it for a spin and let us know what you think?

Originally published: May 02, 2017

Browse the archives for engineering or all blogs Subscribe to the RSS feed for engineering or all blogs.