A Survey of Inlining Heuristics

A survey of inlining heuristics | Max Bernstein

home blog microblog favorites pl resources bread recipes rss

A survey of inlining heuristics

June 3, 2026

Compilers, especially method just-in-time compilers, operate on one function at a time. It is a natural code unit size, especially for a dynamic language JIT: at a given point in time, what more information can you gather about other parts of a running, changing system?

I don’t have any data to back this up—maybe I should go gather some—but on average, methods are small. Especially in languages such as Ruby that use method dispatch for everything, even instance variable (attribute, field, …) lookups, they are small. And everywhere.

This makes the compiler sad. If we are to continue to anthropomorphize them, compilers like having more context so they can optimize better. Consider the following silly-looking example that is actually representative of a surprising amount of real-world code:

class Point attr_reader :x, :y

def initialize(x, y) @x = x @y = y end

def distance(other) Math.sqrt((@x - other.x)**2 + (@y - other.y)**2) end end

def distance_from_origin(x, y) Point.new(x, y).distance(Point.new(0, 0)) end

Right now, in the distance_from_origin method, I count 8 different method calls:

Point.new

Point#initialize

Point.new

Point#initialize

Point#distance

Float#**

Math.sqrt

(Technically more, but the ivar lookups (including attr_reader!), addition, and subtraction are generally specialized and don’t push a frame, even in the interpreter.)

Furthermore, there are at least two heap allocations: one for each Point instance.

Last, there is a bunch of memory traffic to and from Point instances.

This all is a huge bummer! What should be a simple math operation is now overwhelmed with a bunch of other stuff. Point is certainly not a zero-cost abstraction.

Even if we had a bunch of other optimizations such as load-store elimination or escape analysis, they would not be able to do much: pretty much everything escapes and is effectful. That is, unless we inline. Inlining is the lever that enables a bunch of other optimization passes to kick in.

Inlining: the “easy” part

I wrote about the design and implementation of Cinder’s inliner (FB link, personal blog link) a couple of years ago. I wrote about arguably the simplest part, which is copying the callee body into the caller. It took me at least a week to get working. Probably closer to months if you consider all the plumbing through the rest of the JIT. In February during a small hackathon, I watched my colleague k0kubun prototype that bit of the inliner inside ZJIT in about 30 minutes.

There is more to do when pretty much every part of the VM is observable from the guest language: both Python and Ruby allow inspecting the state of the locals, the call stack, etc from user code. Sampling profilers also expect some amount of breadcrumbs to work with to inspect the stack. So there’s some more machinery still required to pretend like the callee function was not inlined. I talk about this a little bit in the Cinder blog post.

Even so, all of that can probably be designed and wired together in a couple of months. Then you will find yourself tuning the inliner for the next 10 years. This is much harder.

When: the harder part

The thing that makes inlining difficult, especially in a method JIT, is that you are trying to make an entire (dynamic!) system faster but you are only looking through a microscope and only capable of local reasoning1. Whereas other optimizations such as strength reduction, inline caches, and value numbering are an un-alloyed good for the generated code, inlining can have negative effects. It is also perhaps the first optimization people add that has non-local impact.

If you inline wrong, your code size might blow up. This might thrash your CPU’s caches. Bummer, but happens to the best of us.

But also, if you inline wrong, you might get in the way of other helpful optimizations: if you hit some size limit after inlining method A, you might never get to inline B, which is the key to unlocking the performance of the method you are trying to optimize.

Last, inlining might hurt compile time. In situations where latency is paramount (think: interactive client JavaScript), adding tons more code into the fray might add noticeable hiccups, even if the long-term throughput improves. As always, in-band compilation is a trade-off because any time you spend compiling, you are not executing code.

You have to write your compiler to reason about all of this stuff. So you have heuristics. For example, here is Michael Pollan’s inliner heuristic:

Inline methods. Mostly small. Not too many.

I did a survey of a bunch of compilers, mostly JIT compilers, to see what their inlining heuristics look like. I also read (skimmed) some papers to see what those folks had to say. I wonder if they agree.

This post was a long time coming. I started working on it about five years ago but...

A Survey of Inlining Heuristics

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy