Recent LLVM hash table improvements | MaskRay
2026-06-07
LLVM has several hash tables. They used quadratic probing with<br>in-band sentinel keys (empty, tombstone); recent work has been replacing<br>that with linear probing with tombstone key removed.
DenseMap (replacement for<br>std::unordered_map):<br>DenseMapInfo::getEmptyKey() /<br>getTombstoneKey().
SmallPtrSet (replacement for<br>std::unordered_set): hard-coded -1<br>(empty) and -2 (tombstone).
StringMap (replacement for<br>std::unordered_map)
For the open-addressed DenseMap and<br>SmallPtrSet, pointers, references, and iterators are<br>invalidated by insert. StringMap is different: each entry<br>lives in a heap-allocated StringMapEntry node, so<br>entry pointers survive grow. std::unordered_map, being<br>node-based, keeps surviving-element pointers valid across both insert<br>and erase and only invalidates the erased element's own iterator. LLVM<br>code rarely needs that stronger contract — callers do not hold<br>long-lived references into the container across mutation — and that gap<br>is what gives pass to relocating erase and bit-array occupancy.
Recently,
Tombstones have been removed from DenseMap and SmallPtrSet.<br>erase() also invalidates pointers.
DenseMap has also retired its empty-key sentinel, leading to<br>significant performance improvements. DenseMap with integer keys<br>(int/unsigned/size_t) had<br>-1/-2 reserved — a footgun, now fixed.
TODO: pending patch on StringMap
SmallPtrSet
SmallPtrSet has a small mode, used when the number of elements is<br>below a threshold, and a large mode. The tombstone state was removed by<br>#96762 in<br>2024. The patch changed erase() operation to invalidate<br>iterators, requiring treewide fixes.
The large mode used a quadratic probing with two sentinel keys<br>(empty, tombstone).
Knuth TAOCP Vol. 3 §6.4 Algorithm R describes an algorithm that<br>avoids lazy deletion. #197637 has<br>implemented it.
I've investigated Robin Hood Hashing and Abseil Swiss Table family<br>implementations - both would lead to inferior performance. Robin Hood<br>Hashing primarily improves find-miss on high-load factors but imposes a<br>small cost on find-hit and inserts.
DenseMap
Two major changes have been merged:
Replace the tombstone state with Algorithm R deletion. (#200595)
Replace the empty state with a bitarray. (#201281)
What the workload looks like
An instrumented clang counts every (KeyT, ValueT)<br>operation while compiling<br>llvm/lib/Analysis/ScalarEvolution.cpp. 597<br>distinct DenseMap/DenseSet types,<br>~186M user ops:
op<br>count<br>probes mean
find-hit<br>65.2M<br>1.55
find-miss<br>65.7M<br>1.28
insert<br>47.8M<br>1.71
erase<br>7.0M
grow<br>1.5M
Lookups are ~70% of the load; insert ~26%, erase ~4%. Probe means sit<br>between 1.3 and 1.7 — well inside what linear probing handles.<br>operator[] is classified at the bucket it lands on: a<br>present key is counted as find-hit, an empty slot as insert.
A few instantiations carry most of the traffic:
type<br>find-hit<br>find-miss (mean)<br>insert<br>erase<br>grow<br>peak
DenseMap<br>680K<br>24.8M (1.03)<br>202K<br>202K<br>9.3K<br>1 KiB
DenseMap<br>2.8M (2.02)<br>2.2M<br>2.2M<br>11<br>1 MiB
DenseMap *><br>3.0M<br>1.4M (2.26)<br>540K<br>439K<br>13<br>4 MiB
DenseMap<br>772K<br>1.2M (1.94)<br>143K<br>9.0K<br>64 KiB
DenseMap<br>1.3M<br>98K (2.95)<br>175K<br>173K<br>15<br>258 KiB
DenseMap<br>2.8M<br>2.0M (1.52)<br>2.0M<br>124K<br>1 KiB
runs 25.5M lookups at 1.03<br>mean — volume is not clustering.<br>is the single largest<br>erase consumer at 2.2M; this is the workload where the Algorithm R<br>relocating-erase callback earns its keep. Its find-miss column is zero<br>because every access in llvm/lib/IR/Value.cpp goes through<br>operator[] or erase — there is no<br>find/lookup call site, so an empty-slot probe<br>is bucketed as insert rather than find-miss. The bucket's value-slot<br>address is captured by<br>ValueHandleBase *&Entry = Handles[V] and stored as a<br>linked-list back-pointer (PrevPtr), which is exactly what<br>the new OnMoved callback refreshes when Algorithm R shifts<br>neighbors. StringMapEntry peaks at 4 MiB, the regime where<br>the used-bit array's byte overhead matters most. Structural and<br>graph-pointer keys (DeclarationName,<br>LazyCallGraph::Node *) still cost a couple of extra probes<br>per miss.
Code size matters. SIMD tables may be a poor fit.
Linear probing +<br>Algorithm R deletion (#200595)
Linear probing needs a strong pointer hash. The old<br>DenseMapInfo::getHashValue left bump-allocated<br>pointers — clang's dominant key shape — sharing high bits and collapsing<br>onto a short bucket range. Quadratic probing had masked this. #197390<br>switched to a stronger mixer, unblocking both SmallPtrSet and<br>DenseMap.
#200595 replaces quadratic probing plus tombstones with linear<br>probing plus Algorithm R. Two bucket states, no lazy markers. Erase now<br>relocates following entries to close the hole, so entry pointers may be<br>invalidated. The new erase(Key, OnMoved) and<br>erase(iterator, OnMoved) overloads fire a callback once per<br>shifted bucket; ValueHandleBase::RemoveFromUseList uses<br>this to refresh PrevPtr.
The first attempt had a war story. The original landed as #199615,<br>then got reverted by #200421<br>after a SCEV crash: PoisoningVH...