NoSQL, Heroku, and You
Posted by Adam
Why NoSQL Matters
“NoSQL” is a label which encompasses a wave of innovation now happening in the database space. The NoSQL movement has sparked a whirlwind of discussion, debate, and excitement in the technical community. Why is NoSQL generating so much buzz? What does it mean for you, the application developer? And what place does NoSQL have for apps running on the Heroku platform?
SQL (the language) and SQL RDBMS implementations (MySQL, PostgreSQL, Oracle, etc) have been the one-size-fits-all solution for data persistence and retrieval for decades. The rise of the web and the LAMP stack cemented the role of the relational database. But in 2010 we see a variety of application needs which are not satisfied by MySQL and friends. New problems demand new tools. High availability, horizontal scaling, replication, schemaless design, and map/reduce capability are some of the areas being explored by a new crop of datastores, all of which are loosely categorized as NoSQL.
To understand why NoSQL is important to you as an app developer, let’s consider the use cases for some of these features:
- Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB.
- Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop.
- Binary assets (such as MP3s and PDFs) find a good home in a datastore that can serve directly to the user’s browser, like Amazon S3.
- Transient data (like web sessions, locks, or short-term stats) should be kept in a transient datastore like Memcache. (Traditionally we haven’t grouped memcached into the database family, but NoSQL has broadened our thinking on this subject.)
- If you need to be able to replicate your data set to multiple locations (such as syncing a music database between a web app and a mobile device), you’ll want the replication features of CouchDB.
- High availability apps, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of datastores like Cassandra and Riak.
Despite all the use cases described above, there will always be a place for the highly normalized, transactional, ad-hoc-query capabilities of SQL databases. We’re adding new tools to our toolbox, not removing old ones.
Polyglot Persistence – or, How Do You Pick a NoSQL Datastore?
Part of the NoSQL message is: pick the right tool for the job. The use cases outlined above should help guide your choice of datastore, as will many resources around the web like this diagram, these slides, or this video. And, like any technology, you should pick something that feels right for you and your team.
But most apps encompass multiple use cases. How do you pick one database to handle all the types of data your app needs to deal with? NoSQL answers: you don’t have to pick just one. This concept is called polyglot persistence (more details).
For example, we can imagine a web app which uses four different datastores:
- MySQL for low-volume, high-value data like user profiles and billing information
- MongoDB for high-volume, low-value data like hit counts and logs
- Amazon S3 for user-uploaded assets like photos and documents
- Memcached for temporary counters and rendered HTML
Polyglot persistence also makes it easy to dip your toes into NoSQL. Don’t migrate your existing production data – instead, use one of these new datastores as a supplementary tool. (Example: put non-critical session data or stats into Redis or Tokyo Tyrant.) And if you’re starting on a new app, you should give serious consideration to NoSQL options for your primary datastore, in addition to the venerable SQL choices.
NoSQL and the Cloud
The SQL databases we’re using today were designed over a decade ago. They were written with the constraints of 1990s hardware in mind: storage is cheap, memory and cpu are expensive. Today’s machines have different parameters. Memory and CPU are cheap, and can easily be scaled up on demand without capital expenditure using a service like Amazon EC2. But EC2, like all cloud technology, is based on virtualization. Virtualization’s weakness is I/O performance. So, the constraints of disk and memory have swapped: disk is the weak link in the chain, memory and cpu (spread out horizontally) are the strong links. It’s not surprising, then, that the datastores built a decade ago aren’t a good fit for the new parameters of cloud computing.
NoSQL databases tend to use memory over disk as the first-class write location: Redis and Memcached are in-memory only, and even systems like Cassandra use memtables for writes with asynchronous flushing to disk, preventing inconsistent I/O performance from creating write speed bottlenecks. And since NoSQL datastores typically emphasize horizontal scalability via partitioning, this puts them in an excellent position to take advantage of the elastic provisioning capability of cloud. NoSQL and cloud are a natural fit.
Database-as-a-Service is the Future
Infrastructure-as-a-service like Amazon EC2 and Rackspace Cloud are what most of us think of as “cloud.” One of the effects of these large public clouds is that apps now have extremely low latency between themselves and other apps or service providers – 1ms or less compared to 50ms+ on the open internet. This latency difference opens up vast new possibilities for what a 3rd party service provider can offer.
Database-as-as-service is one of the coming decade’s most promising business models. Already services like MongoHQ (MongoDB), Cloudant (CouchDB), and Amazon RDS (MySQL) are offering fully hosted and managed databases to apps running in EC2. Like IaaS, DaaS promises remove up-front capex costs, and bring efficiency of scale and specialization in the admin and operation of databases. Although these services are still very young, the potential benefit of being able to outsource all the headaches of running and scaling your app’s database are enormous.
DaaS also goes hand-in-glove with polyglot persistence. Thanks to database services, you won’t need to learn how to sysadmin/DBA for every datastore you use – you can instead outsource that job to a service provider specializing in each database. One of the reasons databases have historically had a tribal affiliation (someone is a “MySQL guy” or a “Postgres gal” or an “Oracle guy,” but rarely two or all three) is because of the time investment in learning how to admin whatever database you use. DaaS removes that barrier and opens up even greater possibility for polyglot persistence in production use.
Heroku’s Commitment to Database Innovation
Heroku already supports two of the most popular NoSQL databases, MongoDB and CouchDB, as add-ons: MongoHQ and Cloudant. We also support the transient key-value datastore, Memcache, via Northscale’s service.
Looking forward to the future: we have more NoSQL add-ons coming down the pipeline, such as Redis To Go. And we’ll be continuing to work with technology leaders in the NoSQL community to help them bring their database services to market. Our goal is to provide access to the cornucopia of breakthrough new database technologies from the NoSQL world, available from the Heroku add-ons catalog at the click of a button. We hope to make Heroku a great place to play with these new technologies, while still curating a list of options that are fully baked and ready for use in real production applications.
Of course, we can’t forget that Heroku currently runs the largest and most mature SQL-database-as-a-service in the world: our PostgreSQL service, packaged with every Heroku app. We’re continuing to expand and improve this service (including support for great new features in Postgres 9), because we know SQL and the apps that depend on it are here to stay. Reinforcing our commitment to polyglot persistence, we’ll be turning our Postgres service into a separately packaged add-on – still installed by default into each app, but possible to opt out, or combine with other datastore add-ons. We also hope to see other providers in the SQL-as-a-service space besides Heroku’s Postgres service and Amazon RDS.
It’s an exciting time for data, and our team here at Heroku is thrilled to take part in the continuing growth of the NoSQL movement.