How Travis CI Uses Heroku to Scale Their Platform

Editor's note: This is a guest post from Mathias Meyer of Travis CI.

Travis CI is a continuous integration and deployment platform. It started out as a project to offer a free platform for the open source community to run their tests and builds on.

Over the past two years, Travis CI has grown, a lot. What started out with a single server running just a few hundred tests a day turned into a platform used by thousands of open source projects and hundreds of companies for their public and private projects. Travis CI is currently serving more than 62,000 active open source projects, with 32,000 daily builds, and 3,000 private projects with 11,000 daily builds.

Heroku is an important part of how we’ve scaled our platform, and this is the story of how we did it.

Evolving our Architecture

Travis CI started out as a very simple application. A web front-end accepted commit notifications from GitHub, turned them into builds, and pushed them on a queue. A background process scheduled the builds and processed the build logs and build results. It was aptly called hub. That application has run on Heroku since the beginning of Travis CI.

Right from the start, streaming logs live to the web browser was an integral part of Travis CI, making up one of the bigger datastreams that the hub has to process. It also turned into one of the things that were hardest to scale out. As we added pull request testing as one of our major features, our usage exploded, and we started seeing things break in the oddest ways, leading to a couple of outages and learnings on how we needed to improve our architecture.

An important lesson was that the big monolithic process called hub couldn't keep up with the load anymore. In retrospect, it’s amazing how much it was able to process on its own.

Apps, Apps Everywhere!

We started breaking up the hub into lots of little pieces, responsible for very specific parts of what the previous monolithic implementation used to do. Heroku allowed us to quickly iterate on breaking out new applications and deploying them as new Heroku apps. To date, a total of eight different apps now make up Travis CI.

Our applications are split by responsibility:

listener accepts notifications from GitHub and places them on a Sidekiq queue.
gatekeeper turns these requests into build skeletons in our database. It's also responsible for synchronizing user data with GitHub so we're always up-to-date with what repositories a user has access to.
hub schedules the builds and processes the build results.
tasks sends out notifications via email, Campfire, HipChat, IRC and webhooks.
logs processes chunks from the build logs coming in from the workers. We're using a mix of CloudAMQP and the RabbitMQ Bigwig add-ons. Both have provided great throughput and stability for our service.
api serves data to the web front-end and to our command-line client.

The hub is now only responsible for scheduling builds and processing their end results, a very narrow set of tasks compared to before the re-write.

With our monolithic app broken up into smaller, more efficient and scalable pieces, we could address other growth issues. Travis CI is busy running builds all day, but usage goes up as the US West Coast wakes up. To handle these spikes, and to handle our increasing demands for capacity, we needed the ability to scale up processes easily, in particular our API and our logs processing. As we solved one of our biggest scaling challenges, Heroku's dynos made it very easy for us to grow with demand - especially with the recent addition of 2x dynos with more memory (JRuby 1.7 running on JDK 1.7 has higher baseline memory requirements).

JRuby All the Way Down

Very early on we needed to make sure that our service could process lots of things in parallel. With the standard Ruby implementation putting tight restrictions on how many things we could run concurrently, JRuby and the JVM offered a path to greater parallelization.

Thanks to Heroku we were able to easily switch to JRuby very early on. In fact, we started using it around two years ago, and we were one of the earliest users of JRuby on Heroku's platform.

The JVM, JRuby and Celluloid in particular have given us the means to scale out several parts of our infrastructure much more easily than we could've done with standard Ruby. We did have a fair share of problems with it initially, in particular handling timeouts in JRuby 1.6, debugging processes during runtime, and encoding, but all of these have been solved. Fixes were available to us quickly thanks to the JRuby team and Heroku's continuous updates of the Ruby buildpacks.

The Hidden Heroku Gem: Heroku Postgres

The Heroku service that doesn't get enough attention is Heroku Postgres.

We've been running on production instances for a long time now, and thanks to the easy ways of creating followers and forks of running databases, we've been able to upgrade on the go to meet our ever increasing needs.

Travis CI is currently running on a Fugu instance, which offers 3.75GB RAM and 1TB of storage. Thanks to the follower feature, we have been able to upgrade our database as we grow without significant downtime. We throw a decent amount of writes at it, and we process up to 200 messages per second, the vast majority of which represent chunks of build logs directly stored in the database.

We keep hammering our Postgres instance with a ton of writes every minute, every hour, every day. It keeps on performing very well. However, we’re continuing work to make our setup more resilient to failure. For starters, we’re looking into moving parts of our data - log chunks in particular - to a separate Postgres instance to make sure we can serve the ever-increasing read traffic coming from our API. Splitting out log storage to a separate database will be an important step to not only reducing the load on our main database, it will also help us ensure better availability and redundancy.

We're currently running on PostgreSQL 9.0 (travis-ci.org) and 9.1 (travis-ci.com). One of the main reason we want to upgrade to 9.2 on Heroku is that monitoring and gathering metrics has improved significantly in that version. We're working on purging out data that can be stored elsewhere, build logs for example, so we can approach the upgrade eventually without incurring significant downtime.

Conclusion

All in all, we've had a great experience scaling out Travis CI on Heroku's platform. We're very thankful for their support as a sponsor of our open source platform and to have them as a customer, and we consider ourselves happy customers of the great services Heroku provides. We’re also excited to continue to support the community of people using Heroku and Travis CI together. To find out how to use Travis CI to test and deploy on Heroku, get started here.

Video Transcript