Zlib-Rs in Firefox

hazira1 pts0 comments

zlib-rs in Firefox - Trifecta Tech Foundation

zlib-rs in Firefox

2026-06-16<br>Author: Folkert de Vries<br>zlib-rs<br>data compression

As of 151.0.0, Firefox uses zlib-rs for gzip (de)compression. This is very exciting, and has both performance and safety advantages.

We first started talking to Mozilla engineers in summer 2024, and it took 2 years to actually get zlib-rs into production. What took us so long?

Integrating zlib-rs into the Firefox codebase

Switching to zlib-rs is not entirely trivial: we present zlib-rs as a drop-in compatible replacement, but there are some asterisks to this claim. We change the algorithms that are used at the different compression levels (in a way that is consistent with zlib-ng, but inconsistent with stock zlib), so the exact output bytes and output length can change slightly.

The Firefox test suite tested for the exact output bytes in some cases, and for the (rough) output length in more. This is a good fail safe against messing up the compression configuration, but now these tests all needed to be updated.

Firefox also adds a prefix to all symbols: instead of inflate it uses MOZ_Z_inflate to prevent symbol clashes. We've long supported prefixing the symbol name in various ways, so getting this to work was just a matter of configuration.

So some work was needed, but the changes were straightforward. All seemed well, until...

Intel CPU bug

We started seeing crashes. The logs showed that a bounds check had failed that logically couldn't fail. Of course, we're lucky that we even got a bounds check failure; in C you'd just get silent data corruption.

We could not reproduce the issue locally, and as more reports came in, a pattern started to emerge: our implementation triggered the infamous Intel Raptor Lake CPU bug.

This generation of CPUs is plagued by instability and degradation issues. Something in our code was prone to triggering these issues, but of course we had no idea what, or even how to track it down.

Eventually Fabian Giesen wrote "Oodle 2.9.14 and Intel 13th/14th gen CPUs", which identifies the problem as a particular instruction used in writing the result of Huffman coding to memory. Zlib also uses Huffman coding, and zlib-rs turned out to also use the offending instruction.

Still, finding and shipping the solution in Firefox is not a quick fix. This May, shortly after the 151 release, Mozilla engineers shipped the patch, "After a year, Firefox finally stops crashing on Intel's Raptor Lake CPUs — Mozilla releases new version patch critical flaw on Intel 13th-gen and 14th-gen CPUs".

Fixing the bug

Once you know what to look for, fixing the issue is reasonably straightforward. We had this function:

https://godbolt.org/z/GjfYdPe3x

pub fn push_dist(&mut self, dist: u16, len: u8) {<br>let buf = &mut self.buf.as_mut_slice()[self.filled..][..3];<br>let [dist1, dist2] = dist.to_le_bytes();

buf[0] = dist1;<br>buf[1] = dist2;<br>buf[2] = len;

self.filled += 3;

This code is dead simple: we assign three byte values to consecutive indices of an array. But the assembly for this function (with LLVM 22) has this move from ch to memory, which is bits 8-15 of the RCX register:

mov byte ptr [rsi + rdi + 1], ch

Due to the hardware bug, occasionally this instruction will actually write bits 0-7 instead, causing the crashes we were seeing.

To work around LLVM emitting this particular instruction, we use a tiny bit of unsafe code (LLVM is clever, so this was the simplest way we've found to have it generate the right thing):

pub fn push_dist(&mut self, dist: u16, len: u8) {<br>let buf = &mut self.buf.as_mut_slice()[self.filled..][..3];

let bytes = dist.to_le_bytes();<br>unsafe { buf.as_mut_ptr().cast::u8; 2]>().write_unaligned(bytes) }<br>buf[2] = len;

self.filled += 3;

The fix in Firefox by Mike Hommey is here. The patch has been upstreamed into zlib-rs and we will continue to carry that patch for the foreseeable future: it's a marginal amount of unsafe that is easily vetted. These are the sacrifices we make to run reliably on a variety of platforms.

It turns out that LLVM 23 no longer emits the offending instruction, although I believe that is serendipitous and not deliberate. When we bump our MSRV to a version that requires LLVM 23 (e.g. for custom allocators and c-variadic functions) we can drop this workaround.

Results

So why go through all of this trouble? Because zlib-rs is faster. Much faster. Especially on linux x86_64 the speedup is almost silly. These benchmarks from zlib-py compare stock zlib versus zlib-rs:

ONE-SHOT DECOMPRESSION<br>Benchmark CPython zlib zlib_py Speedup<br>decompress 1 KB level=1 7.1 us 1.3 us 5.66x faster<br>decompress 1 KB level=6 7.0 us 2.1 us 3.34x faster<br>decompress 1 KB level=9 7.0 us 2.1 us 3.33x faster<br>decompress 64 KB level=1 219.4 us 6.8 us 32.50x faster<br>decompress 64 KB level=6 218.6 us 7.6 us 28.70x faster<br>decompress 64 KB level=9 217.9 us 7.9 us 27.53x faster<br>decompress 1 MB level=1 3.41 ms 128.0 us 26.61x faster<br>decompress 1 MB level=6 3.42 ms...

zlib firefox faster self decompress level

Related Articles