Using WebAssembly SIMD to Fingerprint CPU Models from the Browser

azerpas1 pts0 comments

Fingerprinting CPUs from the Browser with WebAssembly SIMD — Anthony Manikhouth<br>Contents<br>Falling into a massive rabbit hole<br>How WASM SIMD leaks the hardware<br>Bypassing the 0.1ms wall<br>Mapping the Silicon<br>Classify the results<br>Spoofing and compiler drift<br>What's next<br>Contents · 7 sections › Falling into a massive rabbit hole<br>How WASM SIMD leaks the hardware<br>Bypassing the 0.1ms wall<br>Mapping the Silicon<br>Classify the results<br>Spoofing and compiler drift<br>What's next

Fig. 0 - PCA of SIMD operation timings across multiple CPU models, showing clear clustering by brand and model.

Disclaimer : I'm not a hardware engineer. I'm just an engineer with some experience in browsers and WebAssembly, who accidentally fell into a massive rabbit hole. If you design CPUs for a living, please forgive my inexactitudes and terminology.

For years, browsers have tried to close several leaks exploited by fingerprinting scripts. They allow users or privacy tools to spoof the User-Agent, add noise to Canvas API and restrict information returned by the WebGL API. The end goal is to make every browser look like the same generic box.<br>But there is a fundamental limitation in the browser sandbox: it still has to run on physical hardware. And physical metal has a signature.

I recently built an experimental WebAssembly fingerprinting system that bypasses standard browser privacy mitigations by timing how different CPUs execute SIMD (Single Instruction, Multiple Data) operations.<br>Here's how I might be able to identify the exact CPU model you're using with nearly 82% accuracy, and the brand with 95% accuracy.<br>Falling into a massive rabbit hole

A year ago, while researching WASM and browser fingerprinting, I stumbled upon this markdown file<br>in the WebAssembly/relaxed-simd repo. It warned of potential risks for browser engines:

Identifying underlying characteristics of the device (processor information if the engine that implements this proposal used the most optimal lowerings)

Possibly identifying information about the browser currently in use (i.e. if a sufficiently motivated user writes a hand tuned Wasm function when engines use different lowerings for instructions)

At that time, I already knew how some SIMD<br>operations could leak hardware capabilities, revealing the architecture (x86 vs ARM) or supported extensions (AVX-512, FMA, etc.). That alone is a powerful fingerprinting vector, and difficult for an attacker to spoof reliably, but I felt there was more to it.

How WASM SIMD leaks the hardware

A quick reminder on how SIMD works in WASM: SIMD instructions were added to WASM specification through the SIMD proposal<br>to introduce a portable subset of vector operations that, in most cases, map directly to commonly used hardware instructions. The browser engine implementation, such as V8 or JavaScriptCore (JSC), is responsible for mapping these operations to native machine code for the host CPU.

If a direct 1-to-1 hardware mapping exists, the operation executes natively with very low latency. This latency varies across CPU models due to the physical circuit design, pipeline depth, and structural efficiency of their execution units.

However, for operations like complex byte permutations (i32x4_shuffle, i64x2_shuffle), CPUs have different vector routing capabilities depending on their underlying architectures. If a CPU lacks a dedicated hardware path for a specific operation, the compiler needs to emit a slow multi-instruction emulation fallback to comply with the WASM specification.

This is where the fingerprinting opportunity emerged.<br>I realized that by placing these SIMD operations in a dependency chain loop, where the output of one operation is the input for the next, we are forcing the CPU to reveal the raw instruction latency, which is the exact physical time it takes for a signal to propagate through the execution unit.

That being said, even when the mapping doesn't exist and the engine relies on software emulation, the operation is still tremendously fast. How do you measure something this fast in a browser?

Bypassing the 0.1ms wall

Following the discovery of vulnerabilities such as Spectre, browser vendors have implemented various countermeasures against anti-timing attacks. For instance, browsers degrade the resolution of timers like performance.now() to 0.1ms.

This means you cannot execute a single instruction and measure the time taken with performance.now(), you would only observe the minimum resolution of 0.1ms since these instructions execute on nanosecond scale timelines on modern chips.

I overcame this limitation by placing these SIMD operations in massive loops of millions of iterations. By running a large block of operations and dividing the measured duration by the iteration count, you reduce impact of timer quantization.

With the timing method being as precise as it can be, it was time to verify the theory and run these operations on real devices. I used a wide range of different hardware, including desktops and...

simd browser hardware operations wasm fingerprinting

Related Articles