Ruby 2.4 Released: Faster Hashes, Unified Integers and Better Rounding

The Ruby maintainers continued their annual tradition by gifting us a new Ruby version to celebrate the holiday: Ruby 2.4 is now available and you can try it out on Heroku.

Ruby 2.4 brings some impressive new features and performance improvements to the table, here are a few of the big ones:

Binding#irb - Runtime Invocation for IRB
Unified Integers - Fixnum and Bignum are now Integer
Rounding Changes - More Accurate Kernel#sprintf Rounding
- Background:
Hash Changes - Open Addressing for Cache Utilization via Full Cycle LCG
- Background:

Binding#irb

Have you ever used p or puts to get the value of a variable in your code? If you’ve been writing Ruby the odds are pretty good that you have. The alternative REPL Pry (https://pry.github.io/) broke many of us of this habit, but installing a gem to get a REPL during runtime isn’t always an option, or at least not a convenient one.

Enter binding.irb, a new native runtime invocation for the IRB REPL that ships with Ruby. Now you can simply add binding.irb to your code to open an IRB session and have a look around:

# ruby-2.4.0
class SuperConfusing
  def what_is_even_happening_right_now
    @x = @xy[:y] ** @x

    binding.irb
    # open a REPL here to examine @x, @xy,
    # and possibly your life choices
  end
end

One Integer to Rule Them All

Ruby previously used 3 classes to handle integers: the abstract super class Integer, the Fixnum class for small integers and the Bignum class for large integers. You can see this behavior yourself in Ruby 2.3:

# ruby-2.3.3
irb> 1.class
# => Fixnum
irb> (2**100).class
# => Bignum
irb> Fixnum.superclass
# => Integer
irb> Bignum.superclass
# => Integer

Ruby 2.4 unifies the Fixnum and Bignum classes into a single concrete class Integer:

# ruby-2.4.0
irb> 1.class
# => Integer
irb> (2**100).class
# => Integer

Why Did We Ever Have Two Classes of Integer?

To improve performance Ruby stores small numbers in a single native machine word whenever possible, either 32 or 64 bits in length depending on your processor. A 64-bit processor has a 64-bit word length; the 64 in this case describes the size of the registers on the processor.

The registers allow the processor to handle simple arithmetic and logical comparisons, for numbers up to the word size, by itself; which is much faster than manipulating values stored in RAM.

On my laptop it's more than twice as fast for me to add 1 to a Fixnum a million times than it is to do the same with a Bignum:

# ruby-2.3.3
require "benchmark"

fixnum = 2**40
bignum = 2**80

n = 1_000_000

Benchmark.bm do |x|
  x.report("Adding #{fixnum.class}:") { n.times { fixnum + 1 } }
  x.report("Adding #{bignum.class}:") { n.times { bignum + 1 } }
end

# =>
#                     user     system      total        real
# Adding Fixnum:  0.190000   0.010000   0.200000 (  0.189790)
# Adding Bignum:  0.460000   0.000000   0.460000 (  0.471123)

When a number is too big to fit in a native machine word Ruby will store that number differently, automatically converting it to a Bignum behind the scenes.

How Big Is Too Big?

Well, that depends. It depends on the processor you’re using, as we’ve discussed, but it also depends on the operating system and the Ruby implementation you’re using.

Wait It Depends on My Operating System?

Yes, different operating systems use different C data type models.

When processors first started shipping with 64-bit registers it became necessary to augment the existing data types in the C language, to accommodate larger register sizes and take advantage of performance increases.

Unfortunately, The C language doesn't provide a mechanism for adding new fundamental data types. These augmentations had to be accomplished via alternative data models like LP64, ILP64 and LLP64.

LL-What Now?

LP64, IL64 and LLP64 are some of the data models used in the C language. This is not an exhaustive list of the available C data models but these are the most common.

The first few characters in each of these acronyms describe the data types they affect. For example, the "L" and "P" in the LP64 data model stand for long and pointer, because LP64 uses 64-bits for those data types.

These are the sizes of the relevant data types for these common data models:

|       | int | long | long long | pointer |
|-------|-----|------|-----------|---------|
| LP64  | 32  | 64   | NA        | 64      |
| ILP64 | 64  | 64   | NA        | 64      |
| LLP64 | 32  | 32   | 64        | 64      |

Almost all UNIX and Linux implementations use LP64, including OS X. Windows uses LLP64, which includes a new long long type, just like long but longer.

So the maximum size of a Fixnum depends on your processor and your operating system, in part. It also depends on your Ruby implementation.

Fixnum Size by Ruby Implementation

| Fixnum Range         | MIN             | MAX              |
|----------------------|-----------------|------------------|
| 32-bit CRuby (ILP32) | -2**30          | 2**30 - 1        |
| 64-bit CRuby (LLP64) | -2**30          | 2**30 - 1        |
| 64-bit CRuby (LP64)  | -2**62          | 2**62 - 1        |
| JRuby                | -2**63          | 2**63 - 1        |

The range of Fixnum can vary quite a bit between Ruby implementations.

In JRuby for example a Fixnum is any number between -2⁶³ and 2⁶³-1. CRuby will either have Fixnum values between -2³⁰ and 2³⁰-1 or -2⁶² and 2⁶²-1, depending on the underlying C data model.

Your Numbers Are Wrong, You're Not Using All the Bits

You're right, even though we have 64 bits available we're only using 62 of them in CRuby and 63 in JRuby. Both of these implementations use two's complement integers, binary values that use one of the bits to store the sign of the number. So that accounts for one of our missing bits, how about that other one?

In addition to the sign bit, CRuby uses one of the bits as a FIXNUM_FLAG, to tell the interpreter whether or not a given word holds a Fixnum or a reference to a larger number. The sign bit and the flag bit are at opposite ends of the 64-bit word, and the 62 bits left in the middle are the space we have to store a number.

In JRuby we have 63 bits to store our Fixnum, because JRuby stores both Fixnum and Bignum as 64-bit signed values; they don't need a FIXNUM_FLAG.

Why Are They Changing It Now?

The Ruby team feels that the difference between a Fixnum and a Bignum is ultimately an implementation detail, and not something that needs to be exposed as part of the language.

Using the Fixnum and Bignum classes directly in your code can lead to inconsistent behavior, because the range of those values depends on so many things. They don't want to encourage you to depend on the ranges of these different Integer types, because it makes your code less portable.

Unification also significantly simplifies Ruby for beginners. When you're teaching your friends Ruby you longer need to explain the finer points of 64-bit processor architecture.

Rounding Changes

In Ruby Float#round has always rounded floating point numbers up for decimal values greater than or equal to .5, and down for anything less, much as you learned to expect in your arithmetic classes.

# ruby-2.3.3
irb> (2.4).round
# => 2
irb> (2.5).round
# => 3

During the development of Ruby 2.4 there was a proposal to change this default rounding behavior to instead round to the nearest even number, a strategy known as half to even rounding, or Gaussian rounding (among many other names).

# ruby-2.4.0-preview3
irb> (2.4).round
# => 2
irb> (2.5).round
# => 2
irb> (3.5).round
# => 4

The half to even strategy would only have changed rounding behavior for tie-breaking; numbers that are exactly halfway (.5) would have been rounded down for even numbers, and up for odd numbers.

Why Would Anyone Do That?

The Gaussian rounding strategy is commonly used in statistical analysis and financial transactions, as the resulting values less significantly alter the average magnitude for large sample sets.

As an example let's generate a large set of random values that all end in .5:

# ruby-2.3.3
irb> halves = Array.new(1000) { rand(1..1000) + 0.5 }
# => [578.5...120.5] # 1000 random numbers between 1.5 and 1000.5

Now we'll calculate the average after forcing our sum to be a float, to ensure we don't end up doing integer division:

# ruby-2.3.3
irb> average = halves.inject(:+).to_f / halves.size
# => 510.675

The actual average of all of our numbers is 510.675, so the ideal rounding strategy should give us a rounded average be as close to that number as possible.

Let's see how close we get using the existing rounding strategy:

# ruby-2.3.3
irb> round_up_average = halves.map(&:round).inject(:+).to_f / halves.size
# => 511.175
irb> (average - round_up_average).abs
# => 0.5

We're off the average by 0.5 when we consistently round ties up, which makes intuitive sense. So let's see if we can get closer with Gaussian rounding:

# ruby-2.3.3
irb> rounded_halves = halves.map { |n| n.to_i.even? ? n.floor : n.ceil }
# => [578...120]
irb> gaussian_average = rounded_halves.inject(:+).to_f / halves.size
# => 510.664
irb> (average - gaussian_average).abs
# => 0.011000000000024102

It would appear we have a winner. Rounding ties to the nearest even number brings us more than 97% closer to our actual average. For larger sample sets we can expect the average from Gaussian rounding to be almost exactly the actual average.

This is why Gaussian rounding is the recommended default rounding strategy in the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

So Ruby Decided to Change It Because of IEEE 754?

Not exactly, it actually came to light because Gaussian rounding is already the default strategy for the Kernel#sprintf method, and an astute user filed a bug on Ruby: "Rounding modes inconsistency between round versus sprintf".

Here we can clearly see the difference in behavior between Kernel#sprintf and Float#round:

# ruby 2.3.3
irb(main):001:0> sprintf('%1.0f', 12.5)
# => "12"
irb(main):002:0> (12.5).round
# => 13

The inconsistency in this behavior prompted the proposed change, which actually made it into one of the Ruby 2.4 preview versions, ruby-2.4.0-preview3:

# ruby 2.4.0-preview3
irb(main):006:0> sprintf('%1.0f', 12.5)
# => "12"
irb(main):007:0> 12.5.round
# => 12

In ruby-2.4.0-preview3 rounding with either Kernel#sprintf or Float#round will give the same result.

Ultimately Matz decided this fix should not alter the default behavior of Float#round when another user reported a bug in Rails: "Breaking change in how #round works".

The Ruby team decided to compromise and add a new keyword argument to Float#round to allow us to set alternative rounding strategies ourselves:

# ruby 2.4.0-rc1
irb(main):001:0> (2.5).round
# => 3
irb(main):008:0> (2.5).round(half: :down)
# => 2
irb(main):009:0> (2.5).round(half: :even)
# => 2

The keyword argument :half can take either :down or :even and the default behavior is still to round up, just as it was before.

Why `preview` Versions Are Not for Production

Interestingly before the default rounding behavior was changed briefly for 2.4.0-preview3 there was an unusual Kernel#sprintf bug in 2.4.0-preview2:

# ruby 2.4.0-preview2
irb> numbers = (1..20).map { |n| n + 0.5 }
# => => [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5]
irb> numbers.map { |n| sprintf('%1.0f', n) }
# => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]

In this example Kernel#sprintf appears to be rounding numbers less than 12 up as though it was using the Float#round method's default behavior, which was still in place at this point.

The preview releases before and after 2.4.0-preview2, both 2.4.0-preview1 and 2.4.0-preview3, show the expected sprintf behavior, consistent with ruby-2.3.3:

# ruby 2.4.0-preview1
irb> numbers.map { |n| sprintf('%1.0f', n) }
# => ["2", "2", "4", "4", "6", "6", "8", "8", "10", "10", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]

# ruby 2.4.0-preview3
irb> numbers.map { |n| sprintf('%1.0f', n) }
# => ["2", "2", "4", "4", "6", "6", "8", "8", "10", "10", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]

I discovered this by accident while researching this article and started digging through the 2.4.0-preview2 changes to see if I could identify the cause. I found this commit from Nobu:

commit 295f60b94d5ff6551fab7c55e18d1ffa6a4cf7e3
Author: nobu <nobu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Sun Jul 10 05:27:27 2016 +0000

    util.c: round nearly middle value

    * util.c (ruby_dtoa): [EXPERIMENTAL] adjust the case that the
      Float value is close to the exact but unrepresentable middle
      value of two values in the given precision, as r55604.

    git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

`Kernel#sprintf` Accuracy in Ruby 2.4

This was an early effort by Nobu to handle cases where floating point numbers rounded inconsistently with Kernel#sprintf in ruby-2.3.3 (and before):

# ruby-2.3.3
irb> numbers = (0..9).map { |n| "5.0#{n}5".to_f }
# => [5.005, 5.015, 5.025, 5.035, 5.045, 5.055, 5.065, 5.075, 5.085, 5.095]
irb> numbers.map { |n| sprintf("%.2f", n) }
# => ["5.00", "5.01", "5.03", "5.04", "5.04", "5.05", "5.07", "5.08", "5.08", "5.09"]

In the example above notice that 5.035 and 5.045 both round to 5.04. No matter what strategy Kernel#sprintf is using this is clearly unexpected. The cause turns out to be the unseen precision beyond our representations.

Not to worry though, the final version of Nobu's fixes resolves this issue, and it will be available in Ruby 2.4.

Kernel#sprintf will now consistently apply half to even rounding:

# ruby-2.4.0-rc1
irb> numbers = (0..9).map { |n| "5.0#{n}5".to_f }
# => [5.005, 5.015, 5.025, 5.035, 5.045, 5.055, 5.065, 5.075, 5.085, 5.095]
irb> numbers.map { |n| sprintf("%.2f", n) }
# => ["5.00", "5.02", "5.02", "5.04", "5.04", "5.06", "5.06", "5.08", "5.08", "5.10"]

Better Hashes

Ruby 2.4 introduces some significant changes to the hash table backing Ruby's Hash object. These changes were prompted by Vladimir Makarov when he submitted a patch to Ruby's hash table earlier this year.

If you have a couple of hours to spare that issue thread is an entertaining read, but on the off-chance you're one of those busy developers I'll go through the major points here. First we need to cover some Ruby Hash basics.

If you're already an expert on Ruby hash internals feel free to skip ahead and read about the specific hash changes in Ruby 2.4.

How Ruby Implements Hash

Let's imagine for a moment that we have a severe case of "not invented here" syndrome, and we've decided to make our own Hash implementation in Ruby using arrays. I'm relatively certain we're about to do some groundbreaking computer science here so we'll call our new hash TurboHash, as it's certain to be faster than the original:

# turbo_hash.rb
class TurboHash
  attr_reader :table

  def initialize
    @table = []
  end
end

We'll use the @table array to store our table entries. We gave ourselves a reader to access it so it's easy to peek inside our hash.

We're definitely going to need methods to set and retrieve elements from our revolutionary hash so let's get those in there:

# turbo_hash.rb
class TurboHash
  # ...

  def [](key)
    # remember our entries look like this:
    # [key, value]

    find(key).last
  end

  def find(key)
    # Enumerable#find here will return the first entry that makes
    # our block return true, otherwise it returns nil.

    @table.find do |entry|
      key == entry.first
    end
  end

  def []=(key, value)
    entry = find(key)

    if entry
      # If we already stored it just change the value
      entry[1] = value
    else
      # otherwise add a new entry
      @table << [key, value]
    end
  end
end

Excellent, we can set and retrieve keys. It's time to setup some benchmarking and admire our creation:

require "benchmark"

legacy = Hash.new
turbo  = TurboHash.new

n = 10_000

def set_and_find(target)
  target = rand

  target[key] = rand
  target[key]
end

Benchmark.bm do |x|
  x.report("Hash: ") { n.times { set_and_find(legacy) } }
  x.report("TurboHash: ") { n.times { set_and_find(turbo) } }
end

#                  user     system      total        real
# Hash:        0.010000   0.000000   0.010000 (  0.009026)
# TurboHash:  45.450000   0.070000  45.520000 ( 45.573937)

Well that could have gone better, our implementation is about 5000 times slower than Ruby's Hash. This is obviously not the way Hash is actually implemented.

In order to find an element in @table our implementation traverses the entire array on each iteration; towards the end we're checking nearly 10k entries one at a time.

So let's come up with something better. The iteration is killing us, if we can find a way to index instead of iterating we'll be way ahead.

If we knew our keys were always going to be integers we could just store the values at their indexes inside of @table and look them up by their indexes later.

The issue of course is that our keys can be anything, we're not building some cheap knock-off hash that can only take integers.

We need a way to turn our keys into numbers in a consistent way, so "some_key" will give us the same number every time, and we can regenerate that number to find it again later.

It turns out that the Object#hash is perfect for this purpose:

irb> "some_key".hash
# => 3031662902694417109
irb> "some_other_key".hash
# => -3752665667844152731

irb> "some_key".hash
# => 3031662902694417109

The Object#hash will return unique(ish) integers for any object in Ruby, and you'll get the same number back every time you run it again with an object that's "equal" to the previous object.

For example, every time you create a string in Ruby you'll get a unique object:

irb> a = "some_key"
# => "some_key"
irb> a.object_id
# => 70202008509060

irb> b = "some_key"
# => "some_key"
irb> b.object_id
# => 70202008471340

These are clearly distinct objects, but they will have the same Object#hash return value because a == b:

irb> a.hash
# => 3031662902694417109
irb> b.hash
# => 3031662902694417109

These hash return values are huge and sometimes negative, so we're going to use the remainder after dividing by some small number as our index instead:

irb> a.hash % 11
# => 8

We can use this new number as the index in @table where we store the entry. When we want to look up an item later we can simply repeat the operation to know exactly where to find it.

This raises another issue however, our new indexes are much less unique than they were originally; they range between 0 and 10. If we store more than 11 items we are certain to have collisions, overwriting existing entries.

Rather than storing the entries directly in the table we'll put them inside arrays called "bins". Each bin will end up having multiple entries, but traversing the bins will still be faster than traversing the entire table.

Armed with our new indexing system we can now make some improvements to our TurboHash.

Our @table will hold a collection of bins and we'll store our entries in the bin that corresponds to key.hash % 11:

# turbo_hash.rb
class TurboHash
  NUM_BINS = 11

  attr_reader :table

  def initialize
    # We know our indexes will always be between 0 and 10
    # so we need an array of 11 bins.
    @table = Array.new(NUM_BINS) { [] }
  end

  def [](key)
    find(key).last
  end

  def find(key)
    # now we're searching inside the bins instead of the whole table
    bin_for(key).find do |entry|
      key == entry.first
    end
  end

  def bin_for(key)
    # since hash will always return the same thing we know right where to look
    @table[index_of(key)]
  end

  def index_of(key)
    # a pseudorandom number between 0 and 10
    key.hash % NUM_BINS
  end

  def []=(key, value)
    entry = find(key)

    if entry
      entry[1] = value
    else
      # store new entries in the bins
      bin_for(key) << [key, value]
    end
  end
end

Let's benchmark our new and improved implementation:

                 user     system      total        real
Hash:        0.010000   0.000000   0.010000 (  0.012918)
TurboHash:   3.800000   0.010000   3.810000 (  3.810126)

So that's pretty good I guess, using bins decreased the time for TurboHash by more than 90%. Those sneaky Ruby maintainers are still crushing us though, let's see what else we can do.

It occurs to me that our benchmark is creating 10_000 entries but we only have 11 bins. Each time we iterate through a bin we're actually going over a pretty large array now.

Let's check out the sizes on those bins after the benchmark finishes:

Bin:  Relative Size:          Length:
----------------------------------------
0     +++++++++++++++++++     (904)
1     ++++++++++++++++++++    (928)
2     +++++++++++++++++++     (909)
3     ++++++++++++++++++++    (915)
4     +++++++++++++++++++     (881)
5     +++++++++++++++++++     (886)
6     +++++++++++++++++++     (876)
7     ++++++++++++++++++++    (918)
8     +++++++++++++++++++     (886)
9     ++++++++++++++++++++    (952)
10    ++++++++++++++++++++    (945)

That's a nice even distribution of entries but those bins are huge. How much faster is TurboHash if we increase the number of bins to 19?

                 user     system      total        real
Hash:        0.020000   0.000000   0.020000 (  0.021516)
TurboHash:   2.870000   0.070000   2.940000 (  3.007853)

Bin:  Relative Size:          Length:
----------------------------------------
0     ++++++++++++++++++++++  (548)
1     +++++++++++++++++++++   (522)
2     ++++++++++++++++++++++  (547)
3     +++++++++++++++++++++   (534)
4     ++++++++++++++++++++    (501)
5     +++++++++++++++++++++   (528)
6     ++++++++++++++++++++    (497)
7     +++++++++++++++++++++   (543)
8     +++++++++++++++++++     (493)
9     ++++++++++++++++++++    (500)
10    +++++++++++++++++++++   (526)
11    ++++++++++++++++++++++  (545)
12    +++++++++++++++++++++   (529)
13    ++++++++++++++++++++    (514)
14    ++++++++++++++++++++++  (545)
15    ++++++++++++++++++++++  (548)
16    +++++++++++++++++++++   (543)
17    ++++++++++++++++++++    (495)
18    +++++++++++++++++++++   (542)

We gained another 25%! That's pretty good, I bet it gets even better if we keep making the bins smaller. This is a process called rehashing, and it's a pretty important part of a good hashing strategy.

Let's cheat and peek inside st.c to see how Ruby handles increasing the table size to accommodate more bins:

/* https://github.com/ruby/ruby/blob/ruby_2_3/st.c#L38 */

#define ST_DEFAULT_MAX_DENSITY 5
#define ST_DEFAULT_INIT_TABLE_SIZE 16

Ruby's hash table starts with 16 bins. How do they get away with 16 bins? Weren't we using prime numbers to reduce collisions?

We were, but using prime numbers for hash table size is really just a defense against bad hashing functions. Ruby has a much better hashing function today than it once did, so the Ruby maintainers stopped using prime numbers in Ruby 2.2.0.

What's This Other Default Max Density Number?

The ST_DEFAULT_MAX_DENSITY defines the average maximum number of entries Ruby will allow in each bin before rehashing: choosing the next largest power of two and recreating the hash table with the new, larger size.

You can see the conditional that checks for this in the add_direct function from st.c:

/* https://github.com/ruby/ruby/blob/ruby_2_3/st.c#L463 */

if (table->num_entries > ST_DEFAULT_MAX_DENSITY * table->num_bins) {...}

Ruby's hash table tracks the number of entries as they're added using the num_entries value on table. This way Ruby doesn't need to count the entries to decide if it's time to rehash, it just checks to see if the number of entries is more than 5 times the number of bins.

Let's implement some of the improvements we stole from Ruby to see if we can speed up TurboHash:

class TurboHash
  STARTING_BINS = 16

  attr_accessor :table

  def initialize
    @max_density = 5
    @entry_count = 0
    @bin_count   = STARTING_BINS
    @table       = Array.new(@bin_count) { [] }
  end

  def grow
    # use bit shifting to get the next power of two and reset the table size
    @bin_count = @bin_count << 1

    # create a new table with a much larger number of bins
    new_table = Array.new(@bin_count) { [] }

    # copy each of the existing entries into the new table at their new location,
    # as returned by index_of(key)
    @table.flatten(1).each do |entry|
      new_table[index_of(entry.first)] << entry
    end

    # Finally we overwrite the existing table with our new, larger table
    @table = new_table
  end

  def full?
    # our bins are full when the number of entries surpasses 5 times the number of bins
    @entry_count > @max_density * @bin_count
  end

  def [](key)
    find(key).last
  end

  def find(key)
    bin_for(key).find do |entry|
      key == entry.first
    end
  end

  def bin_for(key)
    @table[index_of(key)]
  end

  def index_of(key)
    # use @bin_count because it now changes each time we resize the table
    key.hash % @bin_count
  end

  def []=(key, value)
    entry = find(key)

    if entry
      entry[1] = value
    else
      # grow the table whenever we run out of space
      grow if full?

      bin_for(key) << [key, value]
      @entry_count += 1
    end
  end
end

So what's the verdict?

                  user     system      total        real
Hash:        0.010000   0.000000   0.010000 (  0.012012)
TurboHash:   0.130000   0.010000   0.140000 (  0.133795)

We lose. Even though our TurboHash is now 95% faster than our last version, Ruby still beats us by an order of magnitude.

All things considered, I think TurboHash fared pretty well. I'm sure there are some ways we could further improve this implementation but it's time to move on.

At long last we have enough background to explain what exactly is about to nearly double the speed of Ruby hashes.

What Actually Changed

Speed! Ruby 2.4 hashes are significantly faster. The changes introduced by Vladimir Makarov were designed to take advantage of modern processor caching improvements by focusing on data locality.

This implementation speeds up the Ruby hash table benchmarks in average by more 40% on Intel Haswell CPU.

https://github.com/ruby/ruby/blob/ruby_2_4/st.c#L98

Oh Good! What?

Processors like the Intel Haswell series use several levels of caching to speed up operations that reference the same region of memory.

When the processor reads a value from memory it doesn't just take the value it needs; it grabs a large piece of memory nearby, operating on the assumption that it is likely going to be asked for some of that data in the near future.

The exact algorithms processors use to determine which bits of memory should get loaded into each cache are somewhat difficult to discover. Manufacturers consider these strategies to be trade secrets.

What is clear is that accessing any of the levels of caching is significantly faster than going all the way out to pokey old RAM to get information.

How Much Faster?

Real numbers here are almost meaningless to discuss because they depend on so many factors within a given system, but generally speaking we can say that L1 cache hits (the fastest level of caching) could speed up memory access by two orders of magnitude or more.

An L1 cache hit can complete in half a nanosecond. For reference consider that a photon can only travel half a foot in that amount of time. Fetching from main memory will generally take at least 100 nanoseconds.

Got It, Fast... Therefore Data Locality?

Exactly. If we can ensure that the data Ruby accesses frequently is stored close together in main memory, we significantly increase our chances of winning a coveted spot in one of the caching levels.

One of the ways to accomplish this is to decrease the overall size of the entries themselves. The smaller the entries are, the more likely they are to end up in the same caching level.

In our TurboHash implementation above our entries were stored as simple arrays, but in ruby-2.3.3 table entries were actually stored in a linked list. Each of the entries contained a next pointer that pointed to the next entry in the list. If we can find a way to get by without that pointer and make the entries smaller we will take better advantage of the processor's built-in caching.

The new approach in ruby.2.4.0-rc1 actually goes even further than just removing the next pointer, it removes the entries themselves. Instead we store the entries in a separate array, the "entries array", and we record the indexes for those entries in the bins array, referenced by their keys.

This approach is known as "open addressing".

Open Addressing

Ruby has historically used "closed addressing" in its hash table, also known as "open hashing". The new alternative approach proposed by Vladimir Makarov uses "open addressing", also known as "closed hashing". I get that naming things is hard, but this can really get pretty confusing. For the rest of this discussion, I will only use open addressing to refer to the new implementation, and closed addressing to refer to the former.

The reason open addressing is considered open is that it frees us from the hash table. The table entries themselves are not stored directly in the bins anymore, as with a closed addressing hash table, but rather in a separate entries array, ordered by insertion.

Open addressing uses the bins array to map keys to their index in the entries array.

Let's set a value in an example hash that uses open addressing:

# ruby-2.4.0-rc1
irb> my_hash["some_key"] = "some_value"

When we set "some_key" in an open addressing hash table Ruby will use the hash of the key to determine where our new key-index reference should live in the bins array:

irb> "some_key".hash
# => -3336246172874487271

Ruby first appends the new entry to the entries array, noting the index where it was stored. Ruby then uses the hash above to determine where in the bins array to store the key, referencing that index.

Remember that the entry itself is not stored in the bins array, the key only references the index of the entry in the entries array.

Determining the Bin

The lower bits of the key's hash itself are used to determine where it goes in the bins array.

Because we're not using all of the available information from the key's hash this process is "lossy", and it increases the chances of a later hash collision when we go to find a bin for our key.

However, the cost of potential collisions is offset by the fact that choosing a bin this way is significantly faster.

In the past, Ruby has used prime numbers to determine the size of the bins array. This approach gave some additional assurance that a hashing algorithm which didn't return evenly distributed hashes would not cause a single bin to become unbalanced in size.

The bin size was used to mod the computed hash, and because the bin size was prime, it decreased the risk of hash collisions as it was unlikely to be a common factor of both computed hashes.

Since version 2.2.0 Ruby has used bin array sizes that correspond to powers of two (16, 32, 64, 128, etc.). When we know the bin size is going to be a factor of two we're able to use the lower two bits to calculate a bin index, so we find out where to store our entry reference much more quickly.

What's Wrong with Prime Modulo Mapping?

Dividing big numbers by primes is slow. Dividing a 64-bit number (a hash) by a prime can take more than 100 CPU cycles for each iteration, which is even slower than accessing main memory.

Even though the new approach may produce more hash collisions, it will ultimately improve performance, because collisions will probe the available bins linearly.

Linear Probing

The open addressing strategy in Ruby 2.4 uses a "full cycle linear congruential generator".

This is just a function that generates pseudorandom numbers based on a seed, much like Ruby's Rand#rand method.

Given the same seed the Rand#rand method will generate the same sequence of numbers, even if we create a new instance:

irb> r = Random.new(7)
# => #<Random:0x007fee63030d50>
irb> r.rand(1..100)
# => 48
irb> r.rand(1..100)
# => 69
irb> r.rand(1..100)
# => 26

irb> r = Random.new(7)
# => #<Random:0x007fee630ca928>
irb> r.rand(1..100)
# => 48
irb> r.rand(1..100)
# => 69
irb> r.rand(1..100)
# => 26

# Note that these values will be distinct for separate Ruby processes.
# If you run this same code on your machine you can expect to get different numbers.

Similarly a linear congruential generator will generate the same numbers in sequence if we give it the same starting values.

Linear Congruential Generator (LCG)

This is the algorithm for a linear congruential generator:

X_n+1 = (a * X_n + c ) % m

For carefully chosen values of a, c, m and initial seed X₀ the values of the sequence X will be pseudorandom.

Here are the rules for choosing these values:

m must be greater than 0 (m > 0)
a must be greater than 0 and less than m (0 < a < m)
c must be greater than or equal to 0 and less than m (0 <= c < m)
X₀ must be greater than or equal to 0 and less than m (0 <= X₀ < m)

Implemented in Ruby the LCG algorithm looks like this:

irb> a, x_n, c, m = [5, 7, 3, 16]
# => [5, 7, 3, 16]

irb> x_n = (a * x_n + c) % m
# => 6
irb> x_n = (a * x_n + c) % m
# => 1
irb> x_n = (a * x_n + c) % m
# => 8

For the values chosen above that sequence will always return 6, 1 and 8, in that order. Because I've chosen the initial values with some additional constraints, the sequence will also choose every available number before it comes back around to 6.

An LCG that returns each number before returning any number twice is known as a "full cycle" LCG.

Full Cycle Linear Congruential Generator

For a given seed we describe an LCG as full cycle when it will traverse every available state before returning to the seed state.

So if we have an LCG that is capable of generating 16 pseudorandom numbers, it's a full cycle LCG if it will generate a sequence including each of those numbers before duplicating any of them.

irb> (1..16).map { x_n = (a * x_n + c) % m }.sort
# => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

These are the additional rules we must use when choosing our starting values to make an LCG full cycle:

c can't be 0 (c != 0)
m and c are relatively prime (the only positive integer that divides both of them is 1)
(a - 1) is divisible by all prime factors of m
(a - 1) is divisible by 4 if m is divisible by 4

The first requirement makes our LCG into a "mixed congruential generator". Any LCG with a non-zero value for c is described as a mixed congruential generator, because it mixes multiplication and addition.

If c is 0 we call the generator a "multiplicative" congruential generator (MCG), because it only uses multiplication. An MCG is also known as a Lehmer Random Number Generator (LRNG).

The last 3 requirements in the list up above make a mixed cycle congruential generator into a full cycle LCG. Those 3 rules by themselves are called the Hull-Dobell Theorem.

Hull-Dobell Theorem

The Hull-Dobell Theorem describes a mixed congruential generator with a full period (one that generates all values before repeating).

In Ruby 2.4 Vladimir has implemented an LCG that satisfies the Hull-Dobell Theorem, so Ruby will traverse the entire collection of bins without duplication.

Remember that the new hash table implementation uses the lower bits of a key's hash to find a bin for our key-index reference, a reference that maps the entry's key to its index in the entries table.

If the first attempt to find a bin for a key results in a hash collision, future attempts will use a different means of calculating the hash.

The unused bits from the original hash are used with the collision bin index to generate a new secondary hash, which is then used to find the next bin.

When the first attempt results in a collision the bin searching function becomes a full cycle LCG, guaranteeing that we will eventually find a home for our reference in the bins array.

Since this open addressing approach allows us to store the much smaller references to entries in the bins array, rather than the entirety of the entries themselves, we significantly decrease the memory required to store the bins array.

The new smaller bins array then increases our chances of taking advantage of the processor caching levels, by keeping this frequently accessed data structure close together in memory. Vladimir improved the data locality of the Ruby hash table.

So Ruby is Faster and Vladimir Is Smart?

Yup! We now have significantly faster hashes in Ruby thanks to Vladimir and a whole host of other Ruby contributors. Please make sure you make a point of thanking the Ruby maintainers the next time you see one of them at a conference.

Contributing to open source can be a grueling and thankless job. Most of the time contributors only hear from users when something is broken, and maintainers can sometimes forget that so many people are appreciating their hard work every day.

Want to Make a Contribution Yourself?

The best way to express your gratitude for Ruby is to make a contribution.

There are all sorts of ways to get started contributing to Ruby, if you're interested in contributing to Ruby itself check out the Ruby Core community page.

Another great way to contribute is by testing preview versions as they’re released, and reporting potential bugs on the Ruby issues tracker. Watch the Recent News page (RSS feed) to find out when new preview versions are available.

Is that everything new in Ruby 2.4?

Not even close. There are many more interesting updates to be found in the Ruby 2.4 ChangeLog.

Here are a few of my favorites that I didn't have time to cover:

Thank you so much for reading, I hope you have a wonderful holiday.

Video Transcript