Two Indexed Hash Tables

Two Indexed Hash Tables | Vladimir Makarov

My career path has been mostly about programming languages and compiler implementation. Most high-level languages have built-in hash tables, and compilers and interpreters extensively use hash tables as the most effective search data structure. So I have a natural interest in hash tables and hashing techniques.

In recent years, the Swiss table design has become popular. Swiss tables are direct open-addressing hash tables with mainly linear probing. They use SIMD to probe quickly even when the table is nearly full and a very high load factor (87.5%) to save memory used by direct addressing tables. Well-known implementations include abseil and Boost.

Open-addressing hash tables with linear probing were first described in a 1958 publication by the famous Russian academician A. Ershov. In the publication, Ershov (he was a director of a postgraduate school where I studied) gave an empirical estimation based on the Monte Carlo method of the average number of probes for 50% load. Later, average probe counts for open-addressing tables with linear probing depending on the table load were obtained for a simplified model of the table. You can find them in Knuth’s book “The Art of Computer Programming”, volume 3. For load 87.5%, successful and unsuccessful search require correspondingly about 4 and 32 probes on average.

I’ve never been comfortable with the high load factors and slow iterators that come with direct open addressing. So I took a different angle: decrease the load factor and separate the probe metadata from the elements entirely, using indexes to bridge the two. Here’s what came out of it. Spoiler : geomean performance of the resulting indexed open-addressing table on different benchmarks is better than the one of the best direct open-addressing tables.

The core design ideas

Many hash tables store key-value pairs directly in the array used for probing. This has two performance problems on modern CPUs:

Poor cache locality. Probing touches full key-value slots. This wastes cache lines when keys or values are large. We could keep pointers to key-value pairs in the table instead of the pairs themselves to decrease the size of unused slots, but essentially this turns the table into an indexed one. Iteration is even worse. You walk every slot in the array, skipping empty ones. That is especially wasteful when the table is sparse or the elements are large.

Branch misprediction. At 50% load, roughly half the probes hit an occupied slot when searching for a key not in the table. The branch on an empty/busy slot becomes a coin flip, which is the worst case for branch prediction. Each misprediction costs ~15-20 cycles on current x86.

I wanted to improve both.

ihtab separates probe metadata from element storage entirely:

There is a compact array of 7-bit hash tags (one byte per tag slot). Eight consecutive tags form a group. The array is probed using SIMD, which checks a group of tags at once. This replaces many unpredictable per-slot branches with a single group comparison and branch, which considerably reduces mispredictions. The 7-bit tags also reduce the probability of a spurious index check to 1/128.

The actual elements live in a separate array . They are referenced by 32-bit indexes stored in another separate array alongside the tags. Probing only touches the small tag and index arrays. Cache lines aren’t wasted on key-value data until a tag match confirms the slot is worth examining.

There is also a small bitmap (one bit per element) recording which elements have been deleted. Iteration walks the dense element array and checks the bitmap. No scanning through empty tag slots, no touching the tag or index arrays at all.

Deleted elements are also marked in the index array with a tombstone value (~0, all bits set). They are skipped during searching. An alternative would be to reserve a dedicated tag value for tombstones, detecting deletions without reading the index array. But that adds complexity and slows down the common case when there are no deleted elements.

Here is the data structure layout for a table with a capacity of 8 elements:

Memory usage. When keys or values are large, ihtab can actually use less memory than a direct open-addressing table. In direct-addressing tables, there are always empty element slots whose memory is wasted. In ihtab, empty tag and index slots take only 5 bytes. Elements themselves are stored densely in the element array with no wasted space.

Table growth. New elements are appended at the position indicated by bound. Insertions continue until the element array is full. At that point, the table rebuilds. Deleted elements are removed from the element array. And if there is still not enough room for a new element, all arrays are doubled in size. The tag, index, and bitmap data are recomputed from the surviving elements.

Table load factor. The tag and index arrays have at most 50% occupancy...

Two Indexed Hash Tables

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI