Adopting the Parallel DWARF linker in dsymutil | Jonas Devlieghere<br>On Apple platforms, the development experience was designed around making the compile-link-debug cycle as fast as possible. For debugging, that means that rather than processing large amounts of DWARF to link it into the final binary, the linker leaves the debug info in the object files and records a debug map that tells the debugger where to find it. When you’re debugging locally, that’s all you need. But if you want to archive the debug info for crash reporting or remote debugging, you need a way to produce a self-contained bundle. That’s where dsymutil comes in.<br>dsymutil is more than a DWARF concatenator. It’s an optimizing linker that leverages the One Definition Rule (ODR) to deduplicate types across compilation units. In a C++ project, every translation unit that includes a header gets its own copy of the DWARF for every type defined in that header.1 dsymutil identifies equivalent types and keeps only one canonical copy. For large C++ projects, this makes the difference between fitting within Mach-O’s 4GB limit or not. In order to do these optimizations, dsymutil needs to parse and semantically analyze the DWARF, which is where most of the time goes.<br>The classic DWARF linking algorithm was single-threaded by design. Debug info for a large project can easily reach hundreds of gigabytes. To avoid loading all of it into memory at once, dsymutil processes one compile unit at a time and streams the output. That constraint makes parallelizing the core linking loop nontrivial. Over the years we’ve made incremental improvements, like processing architectures in parallel and running the analysis and cloning phases in lockstep on separate threads, but the fundamental bottleneck remained: the ODR uniquing happens on one thread. A parallel DWARF linker that can unique types across threads has existed in LLVM for a while. Building that was an incredible effort, but unfortunately it wasn’t quite production-ready due to some major limitations.<br>The Qualification Problem#<br>The biggest challenge with dsymutil has always been qualification. When we upstreamed dsymutil to LLVM, we qualified it by generating bug-for-bug identical DWARF. We did the same thing when we rewrote the cloning phase to use the lockstep algorithm. Having binary-identical output meant we could run diff on two dSYMs to convince ourselves a change was truly NFC.<br>The parallel linker can’t produce binary-identical output. It processes compile units concurrently, so the order in which types are encountered and deduplicated is different. The output is semantically the same (or should be), but the DWARF structure, and hence the bytes, differ. That means the binary compatibility approach that qualified every previous dsymutil change doesn’t apply here.<br>Without a way to compare the output semantically, we had no way to confirm the correctness of the DWARF generated by the parallel linker. It’s relatively easy to spot-check small things in tests, but that doesn’t scale to even medium-sized projects. The really tricky issues only surface at debug-time when the debugger starts misbehaving. In order to even consider the parallel linker in dsymutil, we needed a tool that could tell us, concretely, how the parallel linker’s output differs from the classic linker’s.<br>Semantic DWARF Diffing#<br>Although DWARF looks like a tree of tags and attributes, it’s really a directed acyclic graph. Attributes can reference DIEs in other parts of the tree. A variable references its type, a type references its members’ types, a subprogram references its parameter types, and so on. Comparing two DWARF outputs means matching nodes across two graphs and verifying that their attributes and reachable subgraphs are equivalent.<br>You can’t do this by diffing dwarfdump text. The offsets are different, the ordering of DIEs may differ, and the cross-references point to different positions. That’s without even considering that the dwarfdump output for any real-world project is too big to handle for most tools. What you need is to anchor the comparison on stable identifiers like linkage names, declaration coordinates, and type signatures, then walk the graph from there, comparing attributes and children structurally.<br>We prototyped a semantic diffing tool and ran it on clang, comparing the classic and parallel linker output. Out of roughly 5 million DIEs, it identified about 50,000 differences. We haven’t verified all of those results, and the tool itself is far from production-ready, but it was sufficient to give us a concrete picture of where the two linkers diverge.<br>Determinism#<br>The single biggest blocker towards adopting the parallel linker was its non-determinism. Reproducible builds are non-negotiable for any serious build tool. Without them, you lose the ability to cache, bisect, and verify your artifacts.<br>The...