I work on Heroku’s Runtime Infrastructure team, which focuses on most of the underlying compute and containerization here at Heroku. Over the years, we’ve tuned our infrastructure in a number of ways to improve performance of customer dynos and harden security.
We recently received a support ticket from a customer inquiring about poor performance in two system calls (more commonly referred to as syscalls) their application was making frequently: clock_gettime(3)
and gettimeofday(2)
.
In this customer’s case, they were using a tool to do transaction tracing to monitor the performance of their application. This tool made many such system calls to measure how long different parts of their application took to execute. Unfortunately, these two system calls were very slow for them. Every request was impacted waiting for the time to return, slowing down the app for their users.
To help diagnose the problem we first examined our existing clocksource configuration. The clocksource determines how the Linux kernel gets the current time. The kernel attempts to choose the "best" clocksource from the sources available. In our case, the kernel was defaulting to the xen
clocksource, which seems reasonable at a glance since the EC2 infrastructure that powers Heroku’s Common Runtime and Private Spaces products uses the Xen hypervisor under the hood.
Unfortunately, the version of Xen in use does not support a particular optimization—virtual dynamic shared object (or "vDSO")—for the two system calls in question. In short, vDSO allows certain operations to be performed entirely in userspace rather than having to context switch into kernelspace by mapping some kernel functionality into the current process. Context switching between userspace and kernelspace is a somewhat expensive operation—it takes a lot of CPU time. Most applications won’t see a large impact from occasional context switches, but when context switches are happening hundreds or thousands of times per web request, they can add up very quickly!
Thankfully, there are often several available clocksources to choose from. The available clocksources depends on a combination of the CPU, the Linux kernel version, and the hardware virtualization software being used. Our research revealed tsc
seemed to be the most promising clocksource and would support vDSO. tsc
utilizes the Time Stamp Counter to determine the System Time.
During our research, we also encountered a few other blog posts about TSC. Every source we referenced agreed that non-vDSO accelerated system calls were significantly slower, but there was some disagreement on how safe use of TSC would be. The Wikipedia article linked in the previous paragraph also lists some of these safety concerns. The two primary concerns centered around backwards clock drift that could occur due to: (1) TSC inconsistency that plagued older processors in hyper-threaded or multi-CPU configurations, and (2) when freezing/unfreezing Xen virtual machines. To the first concern, Heroku uses newer Intel CPUs for all dynos that have significantly safer TSC implementations. To the second concern, EC2 instances, which Heroku dynos use, do not utilize freezing/unfreezing today. We decided that tsc
would be the best clocksource choice to support vDSO for these system calls without introducing negative side effects.
We were able to confirm using the tsc
clocksource enabled vDSO acceleration with the excellent vdsotest tool (although you can verify your own results using strace
). After our internal testing, we deployed the tsc
clocksource configuration change to the Heroku Common Runtime and Private Spaces dyno fleet.
While the customer who filed the initial support ticket that led to this change noticed the improvement, the biggest surprise for us was when other customers started inquiring about unexpected performance improvements (which we knew to be a result of this change). It’s always nice for us when our work to solve a problem for a specific customer has a significant positive impact for all customers.
We're glad to be able to make changes like this that benefit all Heroku users. Detailed diagnostic and tuning work like this may not be worth the time investment for an individual engineering team managing their own infrastructure outside of Heroku. Heroku’s scale allows us to identify unique optimization opportunities and invest time into validating and implementing tweaks like this that make apps on Heroku run faster and more reliably.