A Survey of Inlining Heuristics

surprisetalk1 pts0 comments

A survey of inlining heuristics | Max Bernstein

home<br>blog<br>microblog<br>favorites<br>pl resources<br>bread<br>recipes<br>rss

A survey of inlining heuristics

June 3, 2026

Compilers, especially method just-in-time compilers, operate on one function at<br>a time. It is a natural code unit size, especially for a dynamic language JIT:<br>at a given point in time, what more information can you gather about other<br>parts of a running, changing system?

I don’t have any data to back this up—maybe I should go gather some—but on<br>average, methods are small. Especially in languages such as Ruby that use<br>method dispatch for everything, even instance variable (attribute, field, …)<br>lookups, they are small. And everywhere.

This makes the compiler sad. If we are to continue to anthropomorphize them,<br>compilers like having more context so they can optimize better. Consider the<br>following silly-looking example that is actually representative of a surprising<br>amount of real-world code:

class Point<br>attr_reader :x, :y

def initialize(x, y)<br>@x = x<br>@y = y<br>end

def distance(other)<br>Math.sqrt((@x - other.x)**2 + (@y - other.y)**2)<br>end<br>end

def distance_from_origin(x, y)<br>Point.new(x, y).distance(Point.new(0, 0))<br>end

Right now, in the distance_from_origin method, I count 8 different method calls:

Point.new

Point#initialize

Point.new

Point#initialize

Point#distance

Float#**

Float#**

Math.sqrt

(Technically more, but the ivar lookups (including attr_reader!), addition,<br>and subtraction are generally specialized and don’t push a frame, even in the<br>interpreter.)

Furthermore, there are at least two heap allocations: one for each Point<br>instance.

Last, there is a bunch of memory traffic to and from Point instances.

This all is a huge bummer! What should be a simple math operation is now<br>overwhelmed with a bunch of other stuff. Point is certainly not a zero-cost<br>abstraction.

Even if we had a bunch of other optimizations such as load-store elimination or<br>escape analysis, they would not be able to do much: pretty much everything<br>escapes and is effectful. That is, unless we inline. Inlining is the lever<br>that enables a bunch of other optimization passes to kick in.

Inlining: the “easy” part

I wrote about the design and implementation of Cinder’s inliner (FB<br>link,<br>personal blog link) a couple of years ago. I wrote<br>about arguably the simplest part, which is copying the callee body into the<br>caller. It took me at least a week to get working. Probably closer to months if<br>you consider all the plumbing through the rest of the JIT. In February during a<br>small hackathon, I watched my colleague k0kubun<br>prototype that bit of the inliner inside ZJIT in about 30 minutes.

There is more to do when pretty much every part of the VM is observable from<br>the guest language: both Python and Ruby allow inspecting the state of the<br>locals, the call stack, etc from user code. Sampling profilers also expect some<br>amount of breadcrumbs to work with to inspect the stack. So there’s some more<br>machinery still required to pretend like the callee function was not inlined. I<br>talk about this a little bit in the Cinder blog post.

Even so, all of that can probably be designed and wired together in a couple<br>of months. Then you will find yourself tuning the inliner for the next 10<br>years. This is much harder.

When: the harder part

The thing that makes inlining difficult, especially in a method JIT, is that<br>you are trying to make an entire (dynamic!) system faster but you are only<br>looking through a microscope and only capable of local reasoning1.<br>Whereas other optimizations such as strength reduction, inline caches, and<br>value numbering are an un-alloyed good for the generated code, inlining can<br>have negative effects. It is also perhaps the first optimization people add<br>that has non-local impact.

If you inline wrong, your code size might blow up. This might thrash your CPU’s<br>caches. Bummer, but happens to the best of us.

But also, if you inline wrong, you might get in the way of other helpful<br>optimizations: if you hit some size limit after inlining method A, you might<br>never get to inline B, which is the key to unlocking the performance of the<br>method you are trying to optimize.

Last, inlining might hurt compile time. In situations where latency is<br>paramount (think: interactive client JavaScript), adding tons more code into<br>the fray might add noticeable hiccups, even if the long-term throughput<br>improves. As always, in-band compilation is a trade-off because any time you<br>spend compiling, you are not executing code.

You have to write your compiler to reason about all of this stuff. So you have<br>heuristics. For example, here is Michael Pollan’s inliner heuristic:

Inline methods. Mostly small. Not too many.

I did a survey of a bunch of compilers, mostly JIT compilers, to see what their<br>inlining heuristics look like. I also read (skimmed) some papers to see what<br>those folks had to say. I wonder if they agree.

This post was a long time coming. I started working on it about five years ago<br>but...

point inlining method code time inline

Related Articles