At RubyKaigi I caught up with Matz, Koichi, and Aaron Patterson aka Tenderlove to talk about Ruby 3x3 and our path so far to reach that goal. We discussed Koichi’s guild proposal, just-in-time compilation and the future of Ruby performance.
Jonan: Welcome everyone. Today we are doing an interview to talk about new features coming in Ruby 3. I am here with my coworkers from Heroku, Sasada Koichi and Yukihiro Matsumoto, along with Aaron Patterson from GitHub.
Jonan: So, last year at RubyKaigi you announced an initiative to speed up Ruby by three times by the release of version three. Tell us more about Ruby 3x3.
Matz: In the design of the Ruby language we have been primarily focused on productivity and the joy of programming. As a result, Ruby was too slow, because we focused on run-time efficiency, so we’ve tried to do many things to make Ruby faster. For example the engine in Ruby 1.8 was very slow, it was written by me. Then Koichi came in and we replaced the virtual machine. The new virtual machine runs many times faster. Ruby and the Ruby community have continued to grow, and some people still complain about the performance. So we are trying to do new things to boost the performance of the virtual machine. Even though we are an open source project and not a business, I felt it was important for us to set some kind of goal, so I named it Ruby 3x3. The goal is to make Ruby 3 run three times faster as compared to Ruby 2.0. Other languages, for example Java, use the JIT technique, just in time compilation; we don't use that yet in Ruby. So by using that kind of technology and with some other improvements, I think we can accomplish the three times boost.
Aaron: So it’s called Ruby 3x3, three times three is nine and JRuby is at nine thousand. Should we just use JRuby?
Jonan: Maybe we should. So Ruby 3x3 will be three times faster. How are you measuring your progress towards that goal? How do we know? How do you check that?
Matz: Yes, that's an important point. So in the Ruby 3x3 project, we are comparing the speed of Ruby 3.0 with the speed of Ruby 2.0. We have completed many performance improvements in Ruby 2.1 and 2.3, so we want to include that effort in Ruby 3x3. The baseline is Ruby 2.0. This is the classification.
Aaron: So your Rails app will not likely be three times faster on Ruby 3?
Matz: Yeah. Our simple micro-benchmark may run three times faster but we are worried that a real-world application may be slower, it could happen. So we are going to set up some benchmarks to measure Ruby 3x3. We will measure our progress towards this three times goal using those benchmark suites. We haven't set them all up yet but they likely include at least optcarrot (an NES emulator) and some small Rails applications, because Rails is the major application framework for the Ruby language. We’ll include several other types of benchmarks as well. So we have to set that up, we are going to set up the benchmark suites.
Jonan: So, Koichi recently made some changes to GC in Ruby.We now use a generational garbage collector. Beyond the improvements that have been made already to GC, what possibility is there for more improvement that could get us closer to Ruby 3x3? Do you think the GC changes are going to be part of our progress there?
Koichi: As Matz says Ruby’s GC is an important program, it has a huge overhead. However, the recent generational garbage collector I don't think has nearly as much overhead. Maybe only ten percent of Ruby’s time is spent in GC, or something like that. If we can speed up garbage collection an additional ten times, it's still only ten percent of the overall time. So sure we should do more for garbage collection, but we have lots of other more impactful ideas. If we have time and specific requests for GC changes, we will certainly consider those.
Aaron: … and resources...
Aaron: The problem is, since, for us at GitHub we do out-of-band garbage collections, garbage collection time makes no difference on the performance of the requests anyway. So even if garbage collection time is only ten percent of the program and we reduce that to zero, say garbage collection takes no time at all, that's not three times faster so we wouldn't make our goal anyway. So, maybe, GC isn't a good place to focus for the Ruby 3x3 improvements.
Matz: Yeah we have already added the generational garbage collector and incremental garbage collection. So in some cases, some applications, large web applications for example, may no longer need to do that out-of-band garbage collection.
Aaron: Yeah, I think the only reason we are doing it is because we are running Ruby 2.1 in production but we're actually on the path to upgrading. We did a lot of work to get us to a point where we could update to Ruby 2.3, it may be in production already. My team and I did the work, somebody else is doing the deployment of it, so I am not sure if it is in production yet but we may soon be able to get rid of out-of-band collection anyway.
Matz: Yes in my friend's site, out-of-band collection wasn’t necessary after the deployment of Ruby 2.3.
Jonan: So the GC situation right now is that GC is only maybe about ten percent of the time it takes to run any Ruby program anyway. So, even if we cut that time by half, we're not going to win that much progress.
Matz: It's no longer a bottleneck so the priority is lower now.
Jonan: At RailsConf, Aaron talked about memory and memory fragmentation in Ruby. If I remember correctly it looked to me like we were defragging memory, which is addressed, so in my understanding that means that we just point to it by the address; we don't need to put those pieces of memory close together. I'm sure there's a reason we we might want to do that; maybe you can explain it Aaron.
Aaron: Sure. So, one of the issues that we had at, well, we have this issue at GitHub too, is that our heap gets fragmented. We use forking processes, our web server forks, and eventually it means that all of the memory pages get copied out at some point. This is due to fragmentation. When you have a fragmented heap, when we allocate objects, we are allocating into those free slots and so since we're doing writes into those slots, it will copy those pages to child processes. So, what would be nice, is if we could eliminate that fragmentation or reduce the fragmentation and maybe we wouldn't copy the child pages so much. Doing that, reducing the fragmentation like that, can improve locality but not necessarily. If it does, if you are able to improve the locality by storing those objects close to each other in memory, they will be able to hit caches more easily. If they hit those caches, you get faster access, but you can't predict that. That may or may not be a thing, and it definitely won't get us to Ruby 3x3.
Matz: Do you have any proof on this? Or a plan?
Aaron: Any plan? Well yes, I prepared a patch that...
Matz: Making it easier to separate the heap.
Aaron: Yes, two separate heaps. For example with classes or whatever types with classes, we’ll allocate them into a separate heap, because we know that classes are probably not going to get garbage collected so we can put those into a specific location.
Koichi: Do you have plans to use threads at GitHub?
Aaron: Do I have plans to use threads at GitHub? Honestly, I don't know. I doubt it. Probably not. We'll probably continue to use unicorn in production. Well I mean we could but I don't see why. I mean we're working pretty well and we're pretty happy using unicorn in production so I don't think we would switch. Honestly, I like the presentation that you gave about guilds, if we could use a web server based on guilds, that would be, in my opinion, the best way.
Matz: Yes, I think it's promising.
Jonan: So these guilds you mentioned (Koichi spoke about guilds at RubyKaigi), maybe now is a good time to discuss that. Do you want to tell us about guilds? What they are and how that affect plans for Ruby 3x3?
Matz: We have three major goals in Ruby 3. One of them is performance, which is that our program is really running three times faster. The second goal is the concurrency model, which is implemented by something like Ruby guilds.
Koichi: So concurrency and parallelism utilize some CPU cores.
Matz: Yeah, I say concurrency just because the guild is the concurrency model from the programmer's view. Implementation-wise it should be parallelism.
Koichi: I'm asking about the motivation of the concurrency.
Matz: Motivation of the concurrency?
Koichi: Not only the performance but also the model.
Matz: Well we already have threads. Threads are mostly ok but it doesn't run parallel, due to the existing GIL. So guilds are a performance optimization. Concurrency by guilds may make the threading program or the concurrency runtime program faster, but the main topic is the data abstraction for concurrent projects.
Jonan: OK. So while we are on the topic of threads I am curious. I've heard people talk about how it might be valuable to have a higher level of abstraction on top of threads because threads are quite difficult to use safely. Have you all thought about adding something in addition to threads that maybe protects us from ourselves a little bit around some of those problems? Is that what guilds are?
Aaron: Yes, that's essentially what the guild is, it's a higher level abstraction so you can do parallel programming safely versus threads where it's not safe at all. It's just...
Koichi: Yes, so it's a problem with concurrency in Ruby now; sharing mutable objects between threads. The idea of guilds, the abstraction more than guilds specifically, is to prohibit sharing of mutable objects.
Jonan: So when I create an object how would I get it into a guild? If I understand correctly, you have two guilds - A and B - and they both contain mutable objects. With the objects in A, you could run a thread that used only those objects, and run a different thread that used only objects in B, and then you would eliminate this problem and that's why guilds will exist. But how do I put my objects into guilds or move them between guilds? Have you thought about it that far yet?
Matz: Yeah, a guild is like some kind of bin, a container with objects. With it, you cannot access the objects inside the guild from outside, because the objects are members of the guild. However, you can transfer the objects from one guild to another. So, by transferring, the new objects can be accessed in the destination guild.
Jonan: I see, OK. So the objects that are in a guild can't be accessed from outside that guild; other guilds can't get access to them. Then immutable objects are not members of guilds. They are outside.
Koichi: So immutable objects are something like freelance objects. Freelance objects are immutable, so any guild can access them because there are no read-write conflicts.
Jonan: So you would just use pointers to point to those immutable objects?
Koichi: Yes. Also, I want to note that immutable doesn't mean frozen object. Frozen objects can contain mutable objects. So I mean those immutable objects which only contain children that point to immutable objects.
Jonan: So if we had a nested hash, some large data structure, we would need to freeze every object in that in order to reference it this way. Is there a facility in Ruby right now to do that? I think I would have to iterate over that structure freezing objects manually today.
Matz: Not yet.
Jonan: So there might be?
Matz: We need to provide something to freeze these objects.
Aaron: A deep freeze.
Matz: Yes, deep freeze.
Jonan: Deep Freeze is the name of this feature maybe? I think that would be an excellent name for it.
Aaron: I like deep freeze. (Koichi would like to note that the name for this feature has not yet been determined)
Jonan: I think you mentioned it earlier but maybe you could tell us a little more about just in time compilation, the JIT, and how we might approach that in Ruby 3.
Matz: The JIT is a pretty attractive technology for gaining performance. You know, as part of the Ruby 3x3 effort we are probably going to introduce some kind of JIT. Many other virtual machines have introduced the LLVM JIT. However, personally, I don't want to use the LLVM JIT for Ruby 3, just because the LLVM itself is a huge project, and it's much younger than Ruby. Ruby is more than twenty years old. It's possibly going to live for twenty more years, or even longer, so relying on other huge projects is kind of dangerous.
Aaron: What do you think of Shyouhei’s stuff?
Matz: The optimizer?
Matz: Yeah, it's quite interesting, but its application is kind of limited. We have to measure it.
Koichi: I think Shyouhei’s project is a good first step, but we need more time to consider it.
Jonan: Can you explain what it is?
Aaron: Yeah, so Shouhei, what he did was he...
Aaron: Yeah he introduced a de-optimization framework that essentially lets us copy old instructions, or de-optimized instructions, into the existing instruction sequences. So he can optimize instructions and if anything happens that would… well, I guess I should step back a little bit. So if you write, in Ruby,
2 + 4, typically the plus operator is not overwritten. So if you can make that assumption then maybe we can collapse that down and replace it with just six. Right?
Jonan: I see.
Aaron: But if somebody were to override the plus method, we would have to not do that class because we wouldn't know what the plus does. And in order to do that, we have to de-optimize and go back to the original instructions that we had before. So, what Shouhei did was he introduced this de-optimization framework. It would allow us to take those old instructions and copy them back in, in case someone were to do something like what I described, overriding plus.
Matz: JRuby people implement very nice de-optimization technologies. They made just such a de-optimization framework on the Java Virtual Machine, so on this topic at least they are a bit ahead of us.
Aaron: Well the one thing, the other thing that I don't know; if you watch the JRuby or JRuby Plus Truffle stuff, if you read any of the papers about it, there are tradeoffs, the JIT isn't free. I mean we have to take into consideration how much memory usage that will require. People hearing this shouldn't think "oh well let's just add a JIT that's all we have to do and then it will be done". It’s much harder, there are more tradeoffs than just simply add a JIT.
Jonan: Yes. So there was an older implementation, RuJIT, the Ruby JIT, but RuJIT had some memory issues didn't it?
Koichi: Yes, quite severe. It needed a lot of memory. Such memory consumption is controllable, however, so we can configure how much memory they can use.
Jonan: OK, so you just set a limit for how much the JIT uses and then it would do the best it could with what you had given it, basically?
Koichi: RuJIT can improve the performance of micro-benchmarks but I’m not sure about the performance in larger applications.
Jonan: So, for Rails applications maybe we should call it "Ruby 1.2x3" or something.
Aaron: I think that's an interesting question to bring up because if a Rails application is part of the base benchmarks, are we really going to make a Rails application three times faster?
Matz: We need to make our performance number calculations pretty soon. This is a big problem I think. So maybe some kind of operation such as concatenating...
Aaron: Concatenation, yeah.
Matz: … or temporary variable creation or something like that, we can improve the performance.
Aaron: So, I think it's interesting if we come up with a benchmark that's using string concatenation. I mean we could use an implementation for that. For example, what if we used ropes instead. If we did that, maybe string concatenation would become very fast, but we didn't really improve the virtual machine at all, right? So, how do we balance, does that make sense? How do we balance those things?
Matz: So unlike the typical application, the language can be applied anywhere, so it can be used to write Rails applications, or science applications, or games, so I don't think improving that one thing will necessarily change that situation. So we have to do everything, maybe introducing ropes, introducing a JIT in some form, introducing some other mechanisms as well to see that improvement. We have to do it.
Aaron: So maybe the key is in the benchmarks that we have. We have something doing a lot of string concatenations, something doing a lot of math, maybe something doing, I don't know, I/O. Something like that?
Matz: Yeah. We have to. We cannot be measured by one single application, we need several.
Matz: And then in the Rails application we have to avoid the database access. Just because, you know, access to the database is slow, can be very slow. That we cannot improve.
Jonan: So, along with the JIT, you've also talked about some type changes to coming to Ruby 3 and the optional static types. Can you tell us about that?
Matz: Yeah, the third major goal of the Ruby 3 is adding some kind of static typing while keeping the duck typing, so some kind of structure for soft-typing or something like that. The main goal of the type system is to detect errors early. So adding this kind of static type check or type interfaces does not affect runtime.
Matz: It’s just a compile time check. Maybe you can use that kind of information in IDEs so that the editors can use that data for their code completion or something like that, but not for performance improvement.
Aaron: You missed out on a really good opportunity for a pun.
Jonan: Did I? What was the pun?
Aaron: You should have said, "What type of changes will those be?"
Jonan: What type of type changes would those be? Yes. I've been one-upped once again, by pun-master Aaron here.
Aaron: I was holding it in, I really wanted to say something.
Jonan: You looked up there suddenly and I thought, did I move on too early from the JIT discussion? No, it was a pun. That was the pun alert face that happened there, good. I'm sorry that we missed the pun. So, to summarize then, the static type system is not something that will necessarily improve performance...
Jonan: ...but it would be an optional static type system, and it would allow you to check some things before you're running your program and actually running into errors.
Matz: Yeah, and if you catch those errors early you can improve your productivity.
Jonan: Yes, developer productivity.
Jonan: Which is, of course, the primary goal of Ruby, or developer happiness rather, not necessarily productivity. So, the JIT, this just in time compiler, right now Ruby has ahead of time compilation (AOT) optionally? There's some kind of AOT stuff that you can do in Ruby?
Matz: I don't code with it.
Aaron: “Some, kind of”.
Aaron: It has a framework built in to allow you to build your own AOT compiler. It has the tools in there to let you build an AOT compiler, and I think you wrote a gem, the...
Koichi: Yeah, Yomikomu.
Jonan: OK. Yomikomu is an AOT compiler for Ruby. Can you describe just a little bit what that means? What ahead of time compilation would mean in this case? What does it do?
Koichi: Ruby compiles at runtime, so we could store the compiled binary to the file system or something, some database or somewhere. The Yomikomu gem uses this feature, writing out instruction sequences to the file system at runtime, so we can skip the compiler tool in the future. It’s only a small improvement, I think, maybe 30%.
Matz: 30% is huge.
Jonan: That seems like a pretty good improvement to me.
Koichi: I do think so.
Aaron: We just need a few more 30% improvements then Ruby 3x3 is done.
Matz: Yeah, that means 30% of the time is spent in the compiler.
Koichi: Yeah, in 2.3.
Matz: That’s huge!
Aaron: That's what I said!
Jonan: So, rather than JIT, have you thought about maybe like a little too late compiler? We could just compile after the program runs and we don't need to compile it all then. Maybe wouldn’t be as popular as a just in time compiler.
Aaron: One thing I think would be interesting, one thing that I'd like to try someday, is to take the bytecode that's been written out and analyze it. So we could know for example that we can use this trick that Shyouhei’s doing with constant folding. Since we have all of the bytecode written out, you should be able to tell by analyzing the bytecode whether or not... actually maybe you couldn't tell that. I was going to say we could analyze the bytecode and optimize it with code, rewriting an optimized version to disk. But since you can do so much stuff at runtime, I don't know if it would work in all cases.
Koichi: This is exactly what the JIT or some kind of comparable approach aims to do.
Jonan: So, cases like you were talking about earlier where this plus can be overridden in Ruby, so what you would do is assume the plus is not overridden and you would just put six, you would actually write that into the bytecode, just the result of this value. Then this framework would allow you to later, if someone did overwrite the plus method dynamically while the program was running, to swap it out again for the old implementation.
Aaron: So basically the public service announcement is: "don't do that."
Jonan: Don't do that. Don't override plus.
Aaron: Just stop it.
Jonan: Just stop it. You're going to make the Ruby team's life harder.
Koichi: Yes, lots harder.
Jonan: OK. Is there anything else you would like to add about Ruby 3? Anything we didn't touch on today that might be coming?
Matz: You know, we’ve been working on Ruby 3 for maybe two years right now, but we are not expecting to release in a year or even two. Maybe by 2020?
Aaron: Does that mean that we have to wait, are we really going to wait for Ruby 3 to introduce guilds? Or are we going to introduce that before Ruby 3?
Matz: Before Ruby 3 I guess.
Matz: Yeah, we still have a lot of things to do to implement guilds.
Aaron: Of course.
Matz: For example, the garbage collection is pretty difficult. The isolated threads can't access the same objects in that space, so it will be very difficult to implement garbage collection. I think we’ve had a lot of issues with that in the past, so that could take years. But if we’re done, we are happy to introduce guilds into maybe Ruby 2... 6?.
Aaron: 2.6, yeah.
Matz: So this is because we don't want to break compatibility. So if a program isn’t using guilds it should run the same way.
Jonan: So this is how we are able to use immutable objects in Ruby, but they’re frozen objects. They can’t be unfrozen.
Koichi: Freezing is a one-way operation.
Jonan: OK. So then, a friend asked me when I described guilds, he writes a lot of Haskell, he asked me when we are we going to have "real immutable objects", and I don't quite know what he means. Is there some distinction between an immutable object in Ruby and an immutable object in a different language that’s important?
Matz: For example in Haskell, everything is immutable, it’s that kind of language, everything is immutable from day one.
Matz: But in Ruby we have mutable objects, so under that kind of situation we need a whole new construct.
Aaron: Frozen objects should really be immutable. It's really immutable.
Aaron: I don't...
Jonan: You don't know what this person who air-quoted me "real immutable" was saying?
Aaron: Yeah I don't know why they would say "real immutable".
Jonan: Should I unfriend him on Facebook? I think I'm going to after this.
Matz: At least tell him if you want "real immutable" go ahead and use Haskell.
Jonan: I think that's an excellent option, yeah.
Aaron: You just to need to say to them quit "Haskelling" me.
Jonan: I should, I’ll just tell them to quit "Haskelling" me about immutable objects. Well, it has been a pleasure. Thank you very much for taking the time. We've run a little bit longer than promised but I think it was very informative, so hopefully people get a lot out of it. Thank you so much for being here.