Musings on Tracing in PyPy | PyPy
Skip to main content
Last summer, Shriram Krishnamurthi asked on<br>Twitter:
"I'm curious what the current state of tracing JITs is. They used to be all the<br>rage for a while, then I though I heard they weren't so effective, then I<br>haven't heard of them at all. Is the latter because they are ubiquitous, or<br>because they proved to not work so well?"
I replied with my personal (pretty subjective) opinions about the<br>question in a lengthy Twitter thread (which also spawned an even lengthier<br>discussion). I wanted to turn what I wrote there into a blog post to make it<br>more widely available (Twitter is no longer easily consumable without an<br>account), and also because I'm mostly not using Twitter anymore. The blog post<br>i still somewhat terse, I've written a small background section and tried to at<br>least add links to further information. Please ask in the comments if something<br>is particularly unclear.
Background¶
I'll explain a few of the central terms of the rest of the post. JIT compilers<br>are compilers that do their work at runtime, interleaved (or concurrent with)<br>the execution of the program. There are (at least) two common general styles of<br>JIT compiler architectures. The most common one is that of a method-based JIT,<br>which will compile one method or function at a time. Then there are tracing JIT<br>compilers, which generate code by tracing the execution of the user's program.<br>They often focus on loops as their main unit of compilation.
Then there is the distinction between a "regular" JIT compiler and that of a<br>meta-JIT. A regular JIT is built to compile one specific source language to<br>machine code. A meta-JIT is a framework for building JIT compilers for a<br>variety of different languages, reusing as much machinery as possible between<br>the different implementation.
Personal and Project Context¶
Some personal context: my perspective is informed by nearly two<br>decades<br>of work on PyPy. PyPy's implementation language, RPython, has support for a<br>meta-JIT, which allows it to reuse its JIT infrastructure for the various<br>Python versions that we support (currently we do releases of PyPy2.7 and<br>PyPy3.10 together). Our meta-JIT infrastructure has been used for some<br>experimental different languages like:
PyPy's regular expression engine
RPySom, a tiny Smalltalk
Ruby
PHP
Prolog,
Racket,
a database (SQLite)
Lox, the language of Crafting Interpreters
an ARM and RISC-V emulator
and many more
Those implementations had various degrees of maturity and many of them are<br>research software and aren't maintained any more.
PyPy gives itself the goal to try to be extremely compatible with all the<br>quirks of the Python language. We don't change the Python language to make<br>things easier to compile and we support the introspection and debugging<br>features of Python. We try very hard to have no opinions on language design.<br>The CPython core developers come up with the semantics, we somehow deal with<br>them.
Meta-tracing¶
PyPy started using a tracing<br>JIT approach<br>not because we thought method-based just-in-time compilers are bad.<br>Historically we had<br>tried<br>to implement a method-based meta-JIT that was using partial evaluation (we wrote<br>three or four method-based prototypes that all weren't as good as we hoped).<br>After all those experiments<br>failed<br>we switched to the tracing<br>approach, and only at this<br>point did our meta-JIT start producing interesting performance.
In the meta-JIT context tracing has good properties, because tracing has<br>relatively understandable behavior and its easy(ish) to tweak how things work<br>with extra annotations in the interpreter<br>source.
Another reason why meta-tracing often works well for PyPy is that it can often<br>slice through the complicated layers of Python quite effectively and remove a<br>lot of overhead. Python is often described as simple, but I think that's<br>actually a misconception. On the implementation level it's a very big and<br>complicated language, and it is also continuously getting new features every<br>year (the language is quite a bit more complicated than Javascript, for<br>example1).
Truffle¶
Later Truffle came along<br>and made a method-based meta-JIT using partial evaluation work. However Truffle<br>(and Graal) has had significantly more people working on it and much more<br>money invested. In addition, it at first required a quite specific style of<br>AST-based interpreters (in<br>the last few years they have also added support for bytecode-based<br>interpreters).
It's still my impression that getting similar results with Truffle is more<br>work for language<br>implementers<br>than with RPython, and the warmup of<br>Truffle can often pretty bad. But Truffle is definitely an existence proof that<br>meta-JITs don't have to be based on tracing.
Tracing, the good¶
Let's now actually get to he heart of Shriram's question and discuss some of<br>the advantages of tracing that go beyond the ease of using tracing for a<br>meta-JIT.
Tracing allows for doing very aggressive partial<br>inlining,<br>Following just...