QBE - Compiler Backend
QBE<br>compiler backend
QBE 1.3 - Release notes
qbe-1.3
QBE 1.3 took a while to cook, but it is the most significant release<br>since 1.0 with around 7k new lines of code and 1.5k deleted ones.<br>In addition to the usual bug fixes, QBE gained a new and original IL<br>matching algorithm, new optimizations from Roland Paterson-Jones,<br>Scott Graham added support for the Windows ABI, and I implemented a plan<br>suggested by Michael Forney to have QBE produce position-independent code<br>(as in shared objects). QBE is teamwork, and I am happy to thank all the<br>contributors to this release. In the rest of the notes we will take a<br>closer look at some of the meat of the release.
Faster
Every once in a while, the QBE fly stings an outstanding programmer.<br>This time, Roland was the victim! Roland suggested we look at the<br>coremark benchmark<br>to give us a concrete but simple playground to optimize. Initial<br>measurements with qbe-1.2 revealed that we were far behind<br>our “70% of gcc -O2” goal, closer to 40%.<br>We decided to address this for the 1.3 release.
An early inspection of profiling data revealed that the performance<br>gap boiled down in large part to how two functions are treated:<br>ee_isdigit and<br>crcu8.<br>Interestingly, these functions are not really idiomatic C; for instance,<br>ee_isdigit is typically inlined textually, uses<br>&& instead of & and skips the<br>superfluous ternary operator; as for CRC, it is best implemented<br>using a pre-computed table. This observation was a bit disappointing<br>because, aside from QBE's lack of inlining, it did not point us to a<br>general source of overhead that would apply to other compilation<br>loads. On the other hand, it is expected that CPU-bound<br>code spends a majority of time in compact code sections.
Nonetheless, we implemented numerous optimizations (GVN/GCM, loop<br>optimization, if-elimination, CFG simplification, …) that<br>we could try on both coremark and more realistic usages like the<br>Hare test suite. In the end, we<br>decided to keep only a subset of vetted passes and now score more<br>than 63% of the performance of commercial compilers on vanilla<br>coremark. Notably, we excluded inlining from the optimizations set<br>to postpone solving its incompatibility with the streaming per-function<br>compilation model of QBE. Modifying the coremark benchmark to inline<br>the ee_isdigit function and use a simpler branch-free<br>implementation of crcu8<br>makes QBE reach its 70% goal. The new optimizations should also<br>benefit Hare users: I measured a 33% improvement on the Hare<br>test suite against qbe-1.2 (1.7s vs 2.6s).
Smarter
Since its early days, QBE used a bottom-up tree-numbering algorithm<br>inspired by Ken Thompson's Plan9<br>C compiler (see 5.5. Addressability). The algorithm is fairly<br>generic but has some subtleties in dealing elegantly with associativity<br>and commutativity of arithmetic operators. It has been a long-standing<br>goal of mine to implement a metaprogramming solution to this problem.<br>QBE 1.3 sees the realization of this goal.
A new OCaml tool called mgen is used to compile lispy<br>IL patterns into idiomatic C code that matches them. The mgen<br>tool will look for special comment blocks containing IL patterns and<br>inline the matching C code right below these blocks. The generated<br>C code is designed to look idiomatic in qbe and works similarly to<br>the hand-written logic pre 1.3. See the<br>isel.c<br>file for the current use of mgen.
In more detail, instruction DAGs are matched by following a numbering<br>approach like in Ken Thompson's compiler. Then, mgen<br>associates each number with a bitset indicating which of the toplevel<br>user patterns are matched by the current IL node (temporary); the most<br>suitable pattern can then be selected by handwritten logic. Patterns<br>can include variables which can be collected by running a matcher<br>program. These programs are also generated by mgen in a<br>simple bytecode language which the<br>runmatch()<br>function can interpret.
I expect that in the future mgen is used to simplify<br>instruction selection in more backends and maybe even to recognize<br>IL patterns such as bit rotations in optimization passes.
Nicer
For the 1.3 release, QBE also stung Scott Graham.<br>Scott generously upstreamed his implementation of the Windows ABI,<br>originally found in a fun derivative<br>work. The assembly generated by QBE remains AT&T syntax and is best<br>compiled by the mingw assembler, although I have not tried it myself on Windows.<br>Compiling for Windows is now as simple as passing -t amd64_win<br>to QBE.
Last but not least, QBE improved its support for position-independent<br>code and is now able to link smoothly with and even produce shared objects on<br>most targets. The main blocker to date has been the lack of support for<br>indirect access to globals (e.g., global offset table on ELF). This is now<br>possible at the level of IL through the support of a new extern<br>“dynamic constant” flag (DYNCONST in the IL spec).<br>For example, to access a variable dlvar from a dynamically-linked<br>library, one would...