PostgreSQL 9.6 is now generally available for Heroku Postgres. The main focus of this release is centered around performance. PostgreSQL 9.6 includes enhanced parallelism for key capabilities that sets the stage for significant performance improvements for a variety of analytic and transactional workloads.
With 9.6, certain actions, like individual queries, can be split up into multiple parts and performed in parallel. This means that everything from running queries, creating indexes, and sorting have major improvements that should allow a number of different workloads to execute faster than they had in prior releases of PostgreSQL. With 9.6, the PostgreSQL community, along with Heroku’s own open source contributions to this release (a special thanks to Peter Geoghegan), have laid the foundation to bring those enterprise-class features to the world’s most advanced open source relational database.
Parallelism, Performance, and Scale
Performance in the form of parallelism means that more work can be done at the same time. One of the areas where this makes a big difference is when Postgres needs to scan an entire table to generate a resultset.
Imagine for a moment that your PostgreSQL installation has a table in it called emails
that stores all of the emails being sent by customers within an application. Let’s say that one of the features that’s provided to customers as part of the application is giving counts on the number of emails being sent to particular email addresses, filtered by the type of person that’s receiving the email. That query might look something like this:
SELECT e.to
, count(*) as total
FROM emails e
WHERE e.person_type = ‘executives’
GROUP BY e.to
In this scenario, if our customers have been sending a large number of emails to executives, an index on the table would not help on person_type
column because rows with executive
in the person_type
column represent too many of the rows in the table. In that case, PostgreSQL will resort to scanning all of the rows in the database to find matches for executives
.
For relatively small tables, say thousands of rows, PostgreSQL might be able to perform this quickly. But, if the table has 100 million rows or more, that query will slow to a crawl because it needs to scan every single row. In the 9.6 release, PostgreSQL will be able to break apart the above query and search portions of the table at the same time. This should greatly speed up queries that require full table scans, which happens more often than you think in analytics-based workloads.
The performance improvements in 9.6 weren’t limited to sequential scans on large tables. Much of the work Heroku contributed to this release was in the way of improved sorting. One of the areas where you’ll see considerable improvement is when you create indexes concurrently. Under the hood, each row in a table has what’s called a Tuple Id (TID), not to be confused with an Object Id. A TID consists of two parts, a block and a row index. Together, the TID identifies where the row can be found within the physical structure of the table. Our patch to this code took the tuple ids and transformed them into a different format prior to sorting in the index which would allow PostgreSQL to sort the TIDs even faster.
With our contributions to sorting, when you want to create an index concurrently by using the CREATE INDEX CONCURRENTLY
syntax, you can experience up to a 50% performance improvement on index creation in certain cases. This is an amazing patch because when CREATE INDEX CONCURRENTLY
is used, it won’t lock writes to the table in question like CREATE INDEX
would. This allows your application to operate like it normally would without adverse effects.
Notable Improvements
Beyond the work done on parallelism, PostgreSQL 9.6 has a number of noteworthy improvements:
PostgreSQL foreign data wrapper now supports remote updates, joins and batch updates. That you can distribute workloads across many different PostgreSQL instances.
Full text search can now search for adjacent words.
Improvements to administrative tasks like VACUUM which shouldn’t scan pages unnceccesarily. This is particularly useful for tables that are append-only like
events
orlogs
.
Getting Started
When a new Heroku Postgres database is provisioned on any one of the Heroku plan tiers, whether on the Common Runtime or in Private Spaces, 9.6 will be the default version. If you have an existing database on the platform, please check out our documentation for upgrading. This is an exciting update to PostgreSQL that should have many benefits for the workloads that run on Heroku. Give PostgreSQL 9.6 a spin and let us know how we can make PostgreSQL even better. Together, we can make PostgreSQL one of the best relational databases in the business!