Database Insight with pg-extras

When building your application there's a lot to worry about, from choice of framework and stack to designing the application itself, to questions of when to worry about scalability. Your database shouldn't have to be one extra layer of concern. You should be able to put data in, trust it will stay safe, and finally get data back out – in a performant manner. Yet, with all the moving parts in your application, understanding issues in your database can be a huge pain. Today we're releasing pg-extras, a heroku toolbelt plugin, to provide additional insights into your database and to make working with your database easier.

Get started by installing the plugin for the heroku toolbelt:

$ heroku plugins:install git://github.com/heroku/heroku-pg-extras.git

Now you have many more commands available to provide you the insight you need within the pg namespace of the heroku toolbelt:

$ heroku help pg
...
  pg:bloat [DATABASE]                 #  show table and index bloat in your database ordered by most wasteful
  pg:blocking [DATABASE]              #  display queries holding locks other queries are waiting to be released
  pg:cache_hit [DATABASE]             #  calculates your cache hit rate (effective databases are at 99% and up)
...

You can read more on each command available and what insights you can get from it within the pg-extras readme. Lets highlight a few:

cache_hit

Each production tier plan’s RAM size constitutes the total amount of System Memory on the underlying instance’s hardware, most of which is given to Postgres and used for caching. While a small amount of RAM is used for managing each connection and other tasks, Postgres will take advantage of almost all this RAM for its cache. You can read more about how this works in our article on understanding postgres data caching.

As a guide for most web applications cache hit ratio should be in the 99%+ range.

$ heroku pg:cache_hit
       name      |         ratio
----------------+------------------------
 index hit rate | 0.99985155862675559832
 cache hit rate | 0.99999671620611908765
(2 rows)

index_usage

Premature optimization has both the cost of losing time on feature development and the risk of wasted optimizations. Indexes are one area that’s easy to ignore until you actually need them. A good index should be across a table of some reasonable size and highly selective. However indexes aren’t free, as there is a measurable cost to keeping them updated and storing them, so unused indexes which you can see with heroku pg:unused_indexes are to be avoided. With pg:index_usage you can begin to get a clear idea of how to manage/maintain your indexes. Running the command will give you output like the following:

$ heroku pg:index_usage --app dashboard
       relname       | percent_of_times_index_used | rows_in_table
---------------------+-----------------------------+---------------
 events              |                          65 |       1217347
 app_infos           |                          74 |        314057
 app_infos_user_info |                           0 |        198848

From the above you can see that the app_infos_user_info has never had an index used and could likely benefit from adding an index. Even the events table could benefit from some additional indexes. From this you could then follow a more detailed guide for getting your indexes setup.

locks

Locks are bound to happen within your database, usually these are very short lived on the order of milliseconds. In PostgreSQL fortunately writing data does not hold a lock preventing it from being read. However, you can still encounter unintentional cases where you have long lived locks due to contention. Such cases can create follower lag, cause issues for other queries and in general start to impact application performance. With the pg:locks command you can easily view all current locks and how long they've been held.

kill

Whether its lock contention, a long running analytics query, or a bad cross join there are times where you want to stop a query. The pg:kill statement will allow you to kill any currently running query by specifying its pid which is displayed with commands pg:locks and pg:ps. Or if your database is in dire straights you also have the ability to run pg:killall to kill all currently running queries. Having the ability to stop runaway queries will allow you to feel even safer when others need access to your database to get the reports they need.

The future

We've already found pg-extras incredibly useful and expect you will too. Going forward pg-extras will be the playground for new commands available power users that install the plugin. Over time some commands may leave pg-extras and become part of the toolbelt or be removed if they’re not beneficial. We welcome your input on which commands you find helpful or what else you’d like to see in pg-extras at postgres@heroku.com.

Video Transcript