Zig's New Relationship with LLVM | Loris Cro's Blog
Loris Cro
Personal Website
About<br>Twitter<br>Twitch<br>YouTube<br>GitHub
Zig's New Relationship with LLVM
September 28, 2020<br>min read • by<br>Loris Cro
and<br>Andrew Kelley
While not yet at version 1.0, Zig is about to reach a new level of maturity and stability.
In the early days, Zig was but a thin frontend in front of LLVM. This was instrumental for getting started quickly and filling in gaps of Andrew’s knowledge as a compiler developer. Now, the training wheels of the bicycle are coming off, and LLVM is transitioning into an optional component.
The work to replace the current C++ compiler implementation with a new pure Zig version has begun. Moving to a self-hosted implementation is usually considered a step towards maturity, with most benefits being felt by developers of the language itself. As an example, Go lost some speed of compilation by switching to the self-hosted compiler but, in exchange, it streamlined the toolchain, removed dependencies, and improved the whole development experience.<br>The move to a self-hosted compiler for Zig has similar advantages for the core contributors, but it also makes LLVM an optional dependency , increases compilation speed (instead of losing it), and adds an amazing feature for debug builds of your code: incremental compilation with in-place binary patching , another unique Zig feature.<br>Speeding up compilation<br>Most languages offer some form of caching to speed up compilation, starting from C’s compilation units, up to modules, packages, and other comparable boundaries in more modern languages.<br>Zig also implements a caching system that comes particularly handy when building a project that mixes C and Zig source code, or when using Zig as a C compiler with the zig cc command. Zig keeps track of all the files involved in the compilation, so it can very easily know when an object file can be reused, and this is only one of the advantages of using Zig to compile C code.<br>Zig sources always get bundled into a single compilation unit, so the caching system in its current form doesn’t provide any speedup when editing and recompiling a pure Zig project. The upside is that, not only compiling Zig code is very fast, but also that incremental compilation will provide smart caching for Zig code, more than making up for what we can’t get from simple caching.<br>Incremental compilation<br>Incremental compilation is a form of caching that acts at a higher granularity level than normal “compilation unit”-level caching. The Rust blog has a great post that explains how it works.
Taken from the Rust blog post linked above.<br>In the case of Rust, the compiler builds a dependency graph at the AST level and then saves it to disk alongside the cached intermediate results (object files). When a new compilation is requested, the compiler will be able to easily notice which parts of the AST have changed and thus invalidate all the intermediate results that depend on it.<br>One important detail about this graph is the fact that the right-most box is always invalidated. In other words, the final executable is always re-linked from scratch starting from a mix of old and newly generated object files. It’s clear that this has to be the case, since the final executable depends on everything else and so any meaningful change to the code will invalidate it, but this is where the Zig self-hosted compiler brings a new ingenious idea to the table.<br>In-Place Binary Patching<br>As of Zig version 0.6.0, regardless of the type of release (debug, release-safe, release-fast), there is always a final step delegated to LLVM, which takes at least 70% of the total compilation time including when compiling a debug build, where optimizations aren’t even enabled.<br>The self-hosted compiler will not depend on LLVM for debug builds and will be able to cut compilation time considerably, basically reducing that 70% to almost zero , just by virtue of being a simpler piece of software compared to LLVM.<br>On top of that, since the compiler will have full control over the whole process, it will generate machine code using an ad-hoc strategy optimized for incremental compilation, allowing the compiler to patch the final executable in-place with the new changes.<br>In-place binary patching is based on a granularity of top-level declarations. Each global variable and function can be independently patched because the final binary is structured as a sequence of loosely coupled blocks. Another important characteristic is that all this information is kept in memory, so the compiler will stay open between compilations.<br>If you want to see the self-hosted compiler in action, here’s a 5 minute demo by Andrew:
Designing machine code for incremental compilation<br>Efficient in-place binary patching is something that can only be accomplished by tightly coupling the compiler frontend and backend. Part of the reason this feature is so rarely seen in the wild is that it goes against our better sense of abstraction...