When can the C++ compiler devirtualize a call? – Arthur O'Dwyer – Stuff mostly about C++
When can the C++ compiler devirtualize a call?
Someone recently asked me about devirtualization optimizations: when do they happen?<br>when can we rely on devirtualization? do different compilers do devirtualization<br>differently? As usual, this led me down an experimental rabbit-hole. The answer<br>seems to be: Modern compilers devirtualize calls to final methods pretty reliably.<br>But there are many interesting corner cases — including some I haven’t thought of,<br>I’m sure! — and different compilers do catch different subsets of those corner cases.
First, let’s observe that devirtualization can (probably?) be done more effectively via<br>LTO,<br>using whole-program analysis. I don’t know anything about the state of the art in<br>link-time devirtualization, and it’s hard to experiment with on Compiler Explorer,<br>so I’m not going to talk about LTO at all. We’re looking purely at what the compiler<br>itself can do.
There are basically two situations where the compiler knows enough to<br>devirtualize. They don’t have much in common:
When we know the instance’s dynamic type
The archetypical case here is
void test() {<br>Apple o;<br>o.f();
It doesn’t matter if Apple::f is virtual; all virtual dispatch ever does is<br>invoke the method on the actual dynamic type of the object, and here we know<br>the actual dynamic type is exactly Apple. Static and dynamic dispatch should<br>give us the same result in this case.
A sufficiently smart compiler will use dataflow analysis to optimize non-trivial<br>cases such as
Derived d;<br>Base *p = &d;<br>p->f();
It turns out that even this simple dodge is enough to fool MSVC and ICC.<br>The next test case is
Derived da, db;<br>Base *p = cond ? &da : &db;<br>p->f();
This is too much for Clang, but GCC actually manages to survive it… until<br>you move the conversions to Base* inside the conditional! Here is where<br>even GCC’s analysis fails (Godbolt):
Derived da, db;<br>Base *p = cond ? (Base*)&da : (Base*)&db;<br>p->f();
When we know a “proof of leafness” for its static type
Okay, let’s suppose that we’re receiving a pointer from somewhere else in the<br>system. We know its static type (e.g. Derived*), but we don’t know the actual<br>dynamic type of the object instance to which it points. Still, the compiler can<br>devirtualize a call to Derived::f if it can somehow prove that no type in the<br>entire program can ever override Derived::f.
Proof-by-final
The simplest “proof of leafness” is if you’ve marked Derived as final.
struct Base {<br>virtual int f();<br>};<br>struct Derived final : public Base {<br>int f() override { return 2; }<br>};<br>int test(Derived *p) {<br>return p->f();
A pointer of type Derived* must point to an object instance that is<br>“at least Derived” — i.e., Derived or one of its children.<br>Since Derived is final, it isn’t allowed to have children; therefore<br>the dynamic type of the instance must be exactly Derived, and the compiler<br>can devirtualize this call.
Or, you can mark the specific method Derived::f as final.
The same analysis should apply no matter whether Derived::f is declared<br>in Derived itself, or inherited from Base. So for example the compiler<br>should be equally able to devirtualize
struct Base {<br>virtual int f() { return 1; }<br>};<br>struct Derived final : public Base {};<br>int test(Derived *p) {<br>return p->f();
GCC, Clang, and MSVC pass this test (Godbolt, case one); ICC 21.1.9 is fooled.
An utterly bizarre proof-of-leafness is to observe that when class C’s destructor is<br>final, C must be childless — because if C had a child, the child would<br>have to have a destructor (since you can’t make a class without a destructor),<br>which would then override C’s destructor, which isn’t allowed.<br>Clang actually both warns on final destructors, and optimizes on them.<br>Every other vendor considers this situation very silly<br>and doesn’t dignify it with a codepath as far as I can tell.
Proof-by-internal-linkage
A class whose name has internal linkage cannot be named outside the current translation unit.<br>Therefore, it cannot be derived from outside the current translation unit, either!<br>As long as it has no children in the current TU — or at least no children that override its<br>methods — calls to its virtual functions are devirtualizable.
namespace {<br>class BaseImpl : public Base {};<br>int test(Base *p) {<br>return static_cast(p)->f();
If p really does point to an object instance that is “at least BaseImpl,”<br>then the compiler can prove that the instance must be exactly BaseImpl.<br>(And if p doesn’t point to an instance that is “at least BaseImpl,”<br>the program has undefined behavior anyway.)
This strikes me as a case that might actually come up pretty commonly in real codebases.<br>It’s common to have a base class exposed publicly in the header file, and then one<br>or more derived implementations scoped tightly to a single .cpp file. If you<br>go the extra mile and put those derived implementations into anonymous namespaces,<br>you might be helping out the compiler’s devirtualization...