Faster Database forking
June 12, 2014 by Matthew Soldo
Did you know that Heroku databases can be forked? Forking a database creates a byte-for-byte copy that can be used for testing and development. It is a useful tool that allows teams to be agile with their data.
Today, forking databases is becoming faster. Fast forking reduces the time to create a fork by hours for high transaction database. To quickly fork a database, simply add the
$ heroku addons:add heroku-postgresql:crane --fork BLUE --fast
Fast forks behave differently from regular forks. They take less time to create, but the data will be somewhat out-of-date (as much as 30 hours). If your data has not changed significantly and you have not performed any schema migrations in the last 30 hours, then fast forks are a speedy alternative to regular ones.
Forking Databases at FarmLogs
FarmLogs builds software that makes farming more efficient and profitable. To do this, they make heavy use of Heroku Postgres’ database service, storing geospatial, financial, and historical information such as precipitation, crop history, and soil types.
Database forking allows FarmLogs to rapidly develop and evolve their product. They use forks to test new schema migrations, to benchmark database queries outside of their live environment, and most interestingly, as a tool for hiring. When FarmLogs is interviewing a developer candidate, they provide them with a fork of one of their datasets to develop a new feature against. This allows the candidate to work with production data without risking any impact to FarmLogs’ production systems.
"The extra convenience Heroku provides on top of reliable Postgres hosting makes all the difference" says Jesse Vollmar, CEO of FarmLogs. “Forking and following are two great examples of features that save us a ton of time.”
Heroku Postgres fork takes advantage of our technology for storing and managing databases’ write ahead logs - files which capture each change made to a database. The write ahead logs primary purpose is to allow a database to consistently recover from a system crash, but Heroku uses it for much more.
Using the open-source WAL-E software (started and maintained by Daniel Farina, a member of the Heroku Postgres team), Heroku continually captures sixty second chunks of the write-ahead log and stores them to to a multi-datacenter blob storage service. Base backups - binary copies of the database - are also captured each day. If the underlying infrastructure of a database is ever lost, we can recover the database by retrieving its write ahead log segments and replaying them on a new database. This forms the basis of our Continuous Protection feature.
Forking a database works by instantiating a new database, restoring the original's base backup to it, and then replaying the write ahead log until the time at which the fork command was issued. This is also how database rollbacks work.
If a database has a high transaction volume and the base backup was not taken recently, then the write ahead log replay step requires a significant amount of time; several hours in some cases.
Fork --fast circumvents this step to speed up the fork process. It restores the latest backup and enough WAL files for the database to reach a consistent state, but no more.
Database forking is available on Heroku Postgres’ Standard, Premium, and Enterprise tiers. If you haven’t already tried a database fork you can do so with:
$ heroku addons:add heroku-postgresql:crane --fork DATABASE --fast
Or if you don't already have a standard tier database and would like access to features like fast forks, continuous protection, and rollbacks you can upgrade through our Add-ons catalog now.