Extracting Cycles
The IPv4 Parser AI Couldn't Have Written
2026-06-20
SWAR Zig AI<br>I was reading Daniel Lemire’s blog, which I highly recommend; specifically: Parsing IP addresses quickly (portably, without SIMD magic).<br>Reading the title and given his track record, I braced myself for some sort of insane SWAR (SIMD Within A Register) parsing technique, maybe combined with obscure bit twiddling that I’m too stupid to understand without reading it 12 times, or something along those lines.<br>To my surprise, Lemire showcased a linear scan function mostly written by AI.<br>Although I agree with his conclusion that he shouldn’t retire just yet, I think we should avoid relying on AI to generate fast code altogether!!<br>Before diving into my claim, let’s firstly satisfy our hunger for SWAR and bit tricks!<br>A fast, non-SIMD IPv4 parser<br>The first thing I do when I approach such problems is to get a feel for the valid inputs. When I think about IPv4, three examples quickly come to mind:<br>192.168.1.1<br>255.255.255.255<br>0.0.0.0<br>To keep things simple, we are going to treat only the dotted decimal version!<br>From there, we start noticing some things:<br>The valid input range for each triplet is [0, 255].<br>The input length ranges from 7 to 15 characters.<br>Each string must have 3 dots, ‘.’<br>Reading input:<br>Since the input string will always fit into 16 bytes, we can smash it into 2 u64, loading bytes [0..8] in the first u64, which we will call head, and bytes [len - 8..len] in the second u64, which we will refer to as tail.<br>Unfortunately, with an address.len == 7, we will encounter two problems: the first read will access 1 byte over the input length (since it will try to read until byte 8), and the second read will access 1 byte before the start of our input string, given pointer math.<br>Luckily, the first problem is an acceptable quirk, and we can simply document it in the API contract, since almost always, strings will be null-terminated or part of a larger buffer, and in both cases, we get a safe 8-byte read!<br>As for the second problem, we have to adjust our subtraction such that, in case the input length is 7, we don’t overflow or read before the start of the string.<br>Doing so, the two reads become:<br>var head: u64 = undefined;<br>@memcpy(std.mem.asBytes(&head), address[0..8]);
var tail: u64 = undefined;<br>const len_is_7 = address.len == 7;<br>const begin = if (len_is_7) 0 else address.len - 8;<br>const end = begin + 8;<br>@memcpy(std.mem.asBytes(&tail), address[begin..end]);<br>tail if (len_is_7) 8 else 0; // More on that one later!
Note: Every decent compiler will not emit branches for those ifs!<br>String to integer conversion:<br>Let’s now have a look at what happens with IP address "192.168.1.1":<br>low = '192.168.'<br>high = '.168.1.1'<br>The low variable looks like a great candidate for SWAR integer parsing!!<br>It’s a technique that Lemire discusses a lot on his blog: link. It’s basically doing parallel operations inside a regular general-purpose register instead of a dedicated SIMD one!<br>How SWAR integer-parsing works:<br>Typically, when parsing a string like "192" into its integer representation, we iterate through each byte to verify it falls within the valid ASCII range of 0x30 ('0') to 0x39 ('9'). For each valid character, we subtract the ASCII offset, multiply the result by its positional value (its power of 10), and add it to a running total. Once the iteration is complete, the total holds the final integer value.<br>All of that iterative work can be done in a branchless manner with the following snippet:<br>Note: In Zig, math operators followed by % are an explicit way to allow overflow (+% addition with overflow).<br>// On a Little-Endian CPU, memory is loaded right-to-left into registers.<br>// Memory layout: ['1', '9', '2'] -> 0x31, 0x39, 0x32<br>const input = 0x32_39_31;
// Subtract the ASCII offset (0x30) from all 3 bytes simultaneously<br>// using a single wrapping subtraction (-%).<br>const d = input -% 0x30_30_30; // = 0x32_39_31 - 0x30_30_30 = 0x02_09_01
// Now we need to multiply those values by their positional value<br>// 1 * 100, 9 * 10, 2 * 1 -> * 0x_64_0A_01<br>const value = d *% 0x640A01;
Our result now resides in the third lane and could easily be extracted with (value & 0x00FF0000) >> 16. However, we can play smarty-pants and use a purpose-built x86_64 instruction introduced with the BMI2 extension: pext.<br>pext will accept two operands, an input and a mask, and will basically copy the input bits where the mask bits are 1 to the rightmost spot.<br>Using this, our extraction simply becomes: pext(value, 0xFF0000); the reason to involve such an obscure operation will be clearer later!<br>Since this function will try to parse everything, invalid triplets like "999" or even non-digit characters, we need a validation step…<br>Usually, SWAR validation is simply a range check to see if our processed characters fall nicely between 0x00 and 0x09. We can achieve this without looping by exploiting the most significant bit, 0x80, in each lane. If we add 0x76 to a valid digit like 0x09, it...