Optimizing Database Performance in Rails

Setting up a database is a relatively straightforward process (Heroku has an add-on for that), but getting it to run well in production is sometimes another matter. As your application grows and your data grows along with it, you will likely find a number of performance bottlenecks specifically related to your database, and this post aims to help you diagnose and address those issues when they arise.

As with all components of your infrastructure it’s important to have early visibility into the performance characteristics of your database. Watching this data as your application grows will give you a much better chance of spotting performance issues and regressions as they’re introduced.

I always recommend to customers they install a monitoring tool immediately after setting up their application. A monitoring serivce can give you all sorts of insight into your app; things like response time, error tracking, or a large set of database-specific features.

Nearly all of my performance investigations start with identifying slow queries, or views that are running far more queries than are necessary. Most of the time these queries are the primary factor in slowing down their application, so much so that it makes little sense to spend effort improving the performance of their code, as any gains would be minor compared to the benefits of good database management.

Heroku Postgres Standard and Premium databases provide database related events that can be viewed on the app's logstream using heroku logs. Postgres alerts are prefixed with [postgres] while Heroku Postgres Metrics events will be prefixed with [heroku-postgres]. Metrics events include things like database size, active connections, and cache hit rate to see how often queries are hitting disk.

Identifying Slow Queries

data.heroku.com provides insight into slow queries that may be impacting the performance of your application. Selecting the database and navigating to the Diagnose tab will display queries for each of these categories:

Most time consuming
Most frequently invoked
Slowest execution time
Slowest I/O

Expensive queries on Heroku Data Dashboard

Using the pg-extras Heroku CLI plugin provides additional commands for diagnosing a Postgres database. The pg:outliers command will identify the longest executing queries. Here's an example of pg:outliers output from the database for CodeTriage.

$ heroku pg:outliers
 total_exec_time | prop_exec_time |   ncalls   |  sync_io_time   |                                                                                                                            query                                                                                                                             
-----------------+----------------+------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 00:27:41.4331   | 32.9%          | 36,118     | 00:00:00.038514 | SELECT  "repos".* FROM "repos" INNER JOIN "repo_subscriptions" ON "repo_subscriptions"."repo_id" = "repos"."id" WHERE (repos.id not in ($2)) GROUP BY repos.id ORDER BY issues_count::float/COUNT(repo_subscriptions.repo_id) DESC LIMIT $1
 00:08:12.72007  | 9.8%           | 16         | 00:06:45.137798 | UPDATE "issues" SET "state" = $1 WHERE (state = $2 and updated_at < $3)
 00:07:55.436261 | 9.4%           | 5,921,363  | 00:03:41.310037 | SELECT  "doc_methods".* FROM "doc_methods" WHERE "doc_methods"."repo_id" = $1 AND "doc_methods"."name" = $2 AND "doc_methods"."path" = $3 ORDER BY "doc_methods"."id" ASC LIMIT $4
 00:05:33.412632 | 6.6%           | 14,157,660 | 00:00:00.075173 | SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2
 00:03:56.203844 | 4.7%           | 3,002,492  | 00:00:34.84346  | UPDATE "issues" SET "state" = $1, "updated_at" = $2 WHERE "issues"."id" = $3
 00:03:35.423045 | 4.3%           | 25,586     | 00:03:29.169686 | SELECT  "issue_assignments".* FROM "issue_assignments" INNER JOIN "repo_subscriptions" ON "issue_assignments"."repo_subscription_id" = "repo_subscriptions"."id" WHERE "repo_subscriptions"."user_id" = $1 AND "issue_assignments"."delivered" = $2 LIMIT $3

For more information, please check out the Expensive Queries Dev Center article.

In the Rails world you can easily make your database do unnecessary work with approaches like this:

clients = Client.limit(10)

clients.each do |client|
  puts client.address.postcode
end

This code will run 11 queries; one to find the 10 clients and 10 more as it fetches the postcode from the address for each client.

If you know you’re going to be iterating over a collection of records to get associated data, use the ActiveRecord includes method to get all of the information you’ll need from a single query.

clients = Client.includes(:address).limit(10)

clients.each do |client|
  puts client.address.postcode
end

Now the query to fetch the initial clients will also include the addresses, so our previous 11 queries just became two:

SELECT * FROM clients LIMIT 10;
SELECT addresses.* FROM addresses
  WHERE (addresses.client_id IN (1,2,3,4,5,6,7,8,9,10));

Make sure you’ve made a first pass through your code at this point to pick up the low-hanging fruit. These types of queries can significantly impact the performance of your application.

Add Indexes to Improve Query Time

If you’ve made sure you’re doing your best to generate an efficient query with ActiveRecord and you’re still finding that a specific query is slow, you may benefit from adding an index.

Even if you’ve never added an index you’re already using this feature of your database. The primary key column will always use an index that's generated by the database. As you insert records into your database they’re each assigned sequential IDs, but you can’t necessarily depend on the sorted nature of those numbers when you go to look up a record; they can easily be rearranged.

When you create a new table the database will automatically create a sorted index of those IDs, which allows us to use much more efficient ways to find those records quickly. The same is true of an index on any column. If you know you’re going to be looking up clients by their first names a lot you will definitely want those names sorted alphabetically.

Indexes work best on data when you have fewer unique values, you wouldn’t want to try to index a column like client.notes where you keep content that you’ll rarely use to find records.

While storage space is becoming less of a problem these days you do need to consider the fact that adding an index will add some space on disk. The other downside to using indexes is that new writes will happen in two places; a new client will need to have their first name written to the clients table as well as the first name index. For write-heavy models you may not benefit from having an index, and it could actually slow your database down.

If you’re considering using an index your best bet is always to try it locally and do some benchmarking. You might find the pghero gem useful for identifying potential targets, it will present recommended indexes in the dashboard.

Adding an Index in Rails

The easiest way to make sure you’re indexing properly is to let Rails do it for you. If you’re out to add a new column and you want an index you’ll want to use a command similar to this one:

$ rails generate migration AddPartNumberToProducts part_number:string:index

That command will generate a migration that looks like this:

class AddPartNumberToProducts < ActiveRecord::Migration[5.0]
  def change
    add_column :products, :part_number, :string
    add_index :products, :part_number
  end
end

ActiveRecord will add the part_number column to the products table and then create a separate index for the part_number attribute.

When you’re creating a new model in Rails and you know it’s going to have a reference to another model, you’ll almost always want an index on that foreign key, so you can create the relationship with a reference to automatically generate the index.

$ rails generate migration AddUserRefToProducts user:references

class AddUserRefToProducts < ActiveRecord::Migration[5.0]
  def change
    add_reference :products, :user, foreign_key: true
  end
end

Caching

Caching the results from your database queries is vitally important to a production web application. It’s very unlikely that you’re reading new information every time your app queries the database.

Imagine you have a user profile page, and every time the user reloads their own profile page it generates a new query, or more likely many queries, especially if you’re just getting started with your optimization adventures. There are certain to be large portions of the data retrieved from that query that don’t change very often, if at all. Wouldn’t it be lovely if we weren’t making round trips to the database for all of that information? Let’s setup caching in Rails.

Rails Caching with Redis

When you’re caching the data for your profile page you’re going to need a fast place to store it. There’s no point in caching if it isn’t significantly faster to retrieve something from the cache than it is to go to the database. Redis is a perfect choice as it’s incredibly fast to read from a redis cache store, and Rails will work with Redis out of the box.

Add the redis gem to your Gemfile:

gem 'redis'

We’re going to add the hiredis gem as well, it’s a faster connection library that will improve performance:

gem 'hiredis'

Now we still need to tell Rails to start using our cache store, so we’re going to add this to our production.rb file:

config.cache_store = :redis_cache_store, { url: ENV['REDIS_URL'] }

There are many options for the configuration of caching that I’ve left out here for simplicity, but if you’re going to run in production you’ll probably want something more complex. You can find details in the RedisCacheStore section of the Rails guide on caching.

Once you deploy your application to production you'll want to be using a dedicated Redis instance for caching. Redis only clears the cache when it runs out of space or a record is updated, so it's possible you'll fill your Redis instance to capacity. This is obviously not ideal if you're trying to store other data in Redis that your application uses regularly. If you're using the Heroku Redis add-on the REDIS_URL from the example above will be set automatically, otherwise you'll want to make sure you set the environment variable yourself.

Now that you have your caching up and running in production it's time to put it to use. There are three types of caching that all provide unique benefits and disadvantages. We're going to be addressing fragment caching here as it's the easiest strategy to use and it can provide a lot of value without much setup, but if you'd like to investigate other approaches you can read more about them in the Rails Guide to caching.

Fragment caching is so named because you can use it to cache an individual fragment of view logic. This allows you to have separate caches that can be independently invalidated for each of the view fragments that make up your page.

Rails makes fragment caching especially convenient with the cache block:

<% @products.each do |product| %>
  <% cache product do %>
    <%= render product %>
  <% end %>
<% end %>

The next time your application processes a request for the products to be displayed on a particular page Rails will write a cache entry that includes the updated_at attribute for each record. As the records are updated Rails will check to see if the data it's retrieving from the cache has gone stale, and if necessary refresh the cache. Rails will also invalidate the cache if the HTML for a particular view fragment changes, so you can be sure you're always seeing what you expect.

If you find cause to be caching a fragment conditionally you can also use the cache_if and cache_unless convenience methods:

<% cache_if admin?, product do %>
  <%= render product %>
<% end %>

Fragment caching is really only scratching the surface of what's possible in your caching layer, but it's a good place to start as it's relatively easy to set up and it's unlikely to create many issues for your application. I definitely encourage you to dig deeper on caching, but be aware that there are plenty of challenging bugs you can introduce if you're not careful. There's a reason caching is listed among the two hardest problems in computer science: naming things, caching, and off-by-one errors.

Next Steps

You can make a lot of performance progress just by identifying your bottlenecks, writing smarter queries, and caching. If you're pretty confident you have all of these pieces under control there's still plenty of work you can do to improve your database, even though we won't have time to cover them in depth here.

You should make sure you're not leaking connections; if your applications isn't managing your connections well you may need the assistance of pgbouncer.

You'll also want to be sure you know how to detect and address database locks in your application. If you suspect locking is causing you issues you might benefit from the pg-extras CLI plugin mentioned earlier. The plugin will add a pg:locks command to your CLI tool to view all current locks in your database and how long they've been held.

Finally you're going to want to learn about database bloat and how Postgres deals with it, and specifically why you might want to manage the cleanup process (VACUUM) yourself. You can learn more about the mechanism Postgres uses to prevent bloat and how you can manage it on Heroku in the Managing VACUUM on Heroku Postgres article on the Heroku Dev Center.

Experiment

Hopefully you've found some steps you can take here to optimize your database interactions in Rails. Remember to experiment with any change you make to the way you query your data, and make sure you have the metrics in place to understand your progress.

Good luck with your database adventures, may all your queries be fast and efficient.

Jonan

Video Transcript