Bundler Status Update

Bundler is quickly shaping up to meet all it’s promise as THE best way to manage your application dependencies. This afternoon we updated Heroku to the latest version – 1.0.0RC5. RC5 addresses all known outstanding issues including the git sourced gems. You can see a full changelog on github.

One key problem Bundler was designed to address was the shifting sands of various gems updating and changing dependencies. As many of you have probably found in the past before Bundler, deploying could unexpectedly install new versions of gems on you, breaking your application. Bundler has added a new flag: “—deployment” for this very issue.

When you run “bundle install” on your local development machine, Bundler will automatically create a Gemfile.lock file. The lock file includes a pinned version of all of your gem dependencies, for all groups. When the deploy process then uses the —deployment flag, the installation will only install the version of gems that are listed in the Gemfile.lock, which was generated on your development machine. This ensures that even your dependent gems change, the dependencies on the dependencies change, etc, you won’t be surprised by updating to a different version than you tested against.

To work, the —deployment flag requires that you have a Gemfile.lock. Currently, Heroku runs “bundle install” against your application if you don’t have a Gemfile.lock, and “bundle install —deployment” if you do. Starting next month, we will begin to use the —deployment flag 100% of the time. This means you must commit a Gemfile.lock to your git repo. Simply run bundle install locally, git add your Gemfile.lock, and you’ll be all set for the future.

Blasting through Brazil

Our own Brazilian Pedro Belo will be making two stops in Brazil the first half of August. August 6th and 7th he’ll be speaking at the Oxente Rails 2010 conference. On August 8th he’ll be joining a local meetup in Sao Paulo to talk Ruby, Heroku and Beer.

If you’d like to join the meetup in São Paulo, drop Pedro a note for the location details. Ansioso para vê-lo!

NoSQL, Heroku, and You

Why NoSQL Matters

“NoSQL” is a label which encompasses a wave of innovation now happening in the database space. The NoSQL movement has sparked a whirlwind of discussion, debate, and excitement in the technical community. Why is NoSQL generating so much buzz? What does it mean for you, the application developer? And what place does NoSQL have for apps running on the Heroku platform?

SQL (the language) and SQL RDBMS implementations (MySQL, PostgreSQL, Oracle, etc) have been the one-size-fits-all solution for data persistence and retrieval for decades. The rise of the web and the LAMP stack cemented the role of the relational database. But in 2010 we see a variety of application needs which are not satisfied by MySQL and friends. New problems demand new tools. High availability, horizontal scaling, replication, schemaless design, and map/reduce capability are some of the areas being explored by a new crop of datastores, all of which are loosely categorized as NoSQL.

Heroku Add-ons

To understand why NoSQL is important to you as an app developer, let’s consider the use cases for some of these features:

  • Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB.
  • Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop.
  • Binary assets (such as MP3s and PDFs) find a good home in a datastore that can serve directly to the user’s browser, like Amazon S3.
  • Transient data (like web sessions, locks, or short-term stats) should be kept in a transient datastore like Memcache. (Traditionally we haven’t grouped memcached into the database family, but NoSQL has broadened our thinking on this subject.)
  • If you need to be able to replicate your data set to multiple locations (such as syncing a music database between a web app and a mobile device), you’ll want the replication features of CouchDB.
  • High availability apps, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of datastores like Cassandra and Riak.

Despite all the use cases described above, there will always be a place for the highly normalized, transactional, ad-hoc-query capabilities of SQL databases. We’re adding new tools to our toolbox, not removing old ones.

Polyglot Persistence – or, How Do You Pick a NoSQL Datastore?

Part of the NoSQL message is: pick the right tool for the job. The use cases outlined above should help guide your choice of datastore, as will many resources around the web like this diagram, these slides, or this video. And, like any technology, you should pick something that feels right for you and your team.

But most apps encompass multiple use cases. How do you pick one database to handle all the types of data your app needs to deal with? NoSQL answers: you don’t have to pick just one. This concept is called polyglot persistence (more details).

Heroku App

For example, we can imagine a web app which uses four different datastores:

  • MySQL for low-volume, high-value data like user profiles and billing information
  • MongoDB for high-volume, low-value data like hit counts and logs
  • Amazon S3 for user-uploaded assets like photos and documents
  • Memcached for temporary counters and rendered HTML

Polyglot persistence also makes it easy to dip your toes into NoSQL. Don’t migrate your existing production data – instead, use one of these new datastores as a supplementary tool. (Example: put non-critical session data or stats into Redis or Tokyo Tyrant.) And if you’re starting on a new app, you should give serious consideration to NoSQL options for your primary datastore, in addition to the venerable SQL choices.

NoSQL and the Cloud

The SQL databases we’re using today were designed over a decade ago. They were written with the constraints of 1990s hardware in mind: storage is cheap, memory and cpu are expensive. Today’s machines have different parameters. Memory and CPU are cheap, and can easily be scaled up on demand without capital expenditure using a service like Amazon EC2. But EC2, like all cloud technology, is based on virtualization. Virtualization’s weakness is I/O performance. So, the constraints of disk and memory have swapped: disk is the weak link in the chain, memory and cpu (spread out horizontally) are the strong links. It’s not surprising, then, that the datastores built a decade ago aren’t a good fit for the new parameters of cloud computing.

NoSQL databases tend to use memory over disk as the first-class write location: Redis and Memcached are in-memory only, and even systems like Cassandra use memtables for writes with asynchronous flushing to disk, preventing inconsistent I/O performance from creating write speed bottlenecks. And since NoSQL datastores typically emphasize horizontal scalability via partitioning, this puts them in an excellent position to take advantage of the elastic provisioning capability of cloud. NoSQL and cloud are a natural fit.

Database-as-a-Service is the Future

Infrastructure-as-a-service like Amazon EC2 and Rackspace Cloud are what most of us think of as “cloud.” One of the effects of these large public clouds is that apps now have extremely low latency between themselves and other apps or service providers – 1ms or less compared to 50ms+ on the open internet. This latency difference opens up vast new possibilities for what a 3rd party service provider can offer.

Database as a service

Database-as-as-service is one of the coming decade’s most promising business models. Already services like MongoHQ (MongoDB), Cloudant (CouchDB), and Amazon RDS (MySQL) are offering fully hosted and managed databases to apps running in EC2. Like IaaS, DaaS promises remove up-front capex costs, and bring efficiency of scale and specialization in the admin and operation of databases. Although these services are still very young, the potential benefit of being able to outsource all the headaches of running and scaling your app’s database are enormous.

DaaS also goes hand-in-glove with polyglot persistence. Thanks to database services, you won’t need to learn how to sysadmin/DBA for every datastore you use – you can instead outsource that job to a service provider specializing in each database. One of the reasons databases have historically had a tribal affiliation (someone is a “MySQL guy” or a “Postgres gal” or an “Oracle guy,” but rarely two or all three) is because of the time investment in learning how to admin whatever database you use. DaaS removes that barrier and opens up even greater possibility for polyglot persistence in production use.

Heroku’s Commitment to Database Innovation

Heroku already supports two of the most popular NoSQL databases, MongoDB and CouchDB, as add-ons: MongoHQ and Cloudant. We also support the transient key-value datastore, Memcache, via Northscale’s service.

Looking forward to the future: we have more NoSQL add-ons coming down the pipeline, such as Redis To Go. And we’ll be continuing to work with technology leaders in the NoSQL community to help them bring their database services to market. Our goal is to provide access to the cornucopia of breakthrough new database technologies from the NoSQL world, available from the Heroku add-ons catalog at the click of a button. We hope to make Heroku a great place to play with these new technologies, while still curating a list of options that are fully baked and ready for use in real production applications.

Of course, we can’t forget that Heroku currently runs the largest and most mature SQL-database-as-a-service in the world: our PostgreSQL service, packaged with every Heroku app. We’re continuing to expand and improve this service (including support for great new features in Postgres 9), because we know SQL and the apps that depend on it are here to stay. Reinforcing our commitment to polyglot persistence, we’ll be turning our Postgres service into a separately packaged add-on – still installed by default into each app, but possible to opt out, or combine with other datastore add-ons. We also hope to see other providers in the SQL-as-a-service space besides Heroku’s Postgres service and Amazon RDS.

It’s an exciting time for data, and our team here at Heroku is thrilled to take part in the continuing growth of the NoSQL movement.

Teambox on Heroku

More and more developers are using Heroku as a SaaS deployment platform. By creating their applications on top of Heroku, they can leverage our architecture and security model to provide SaaS to their customers easily. Today we want to highlight a new favorite, Teambox.

Teambox is an opensource twitter-like collaboration tool for companies organization and teams. Teams around the world use it to collaborate and keep in touch, track tasks and much more.

The teambox team has made it easy to install on Heroku as well. This screencast walks you through the instructions from start to finish in just 5 minutes. Give it a try yourself, and try out their collaboration tool.

Default to Bamboo

Deployment stacks have been a huge success. For many developers, heroku create —stack bamboo has become the default whenever creating new apps. With the latest version of Rails 2 and Rails 3 both requiring the Bamboo stack, we’re excited to make Bamboo the new default.

Effective immediately, all newly created apps will default to the bamboo stack with REE 1.8.7. You can still use the old aspen stack if you’d like by simply specifying `heroku create —stack aspen`. Existing apps stay on the stack they are on unless you explicitly migrate them.

A key feature of bamboo is to eliminate pre-installed gems. This provides app developers with considerably more flexibility in managing their apps. You can easily use any version of any gem by simply including it in your .gems file or Bundler Gemfile. You’ll need to remember to include all gems you are using. If you’re using Gemfile, this is automatically done for you. If you’re using .gems, please make sure to include all the gems you use, INCLUDING rails!

Browse the blog archives or subscribe to the full-text feed.