Show HN: Lossless GIF recompression via exhaustive search

arusekk1 pts0 comments

Lossless GIF recompression via exhaustive search | Arusekk blog

A bit of history#

GIF is the oldest widespread compressed image format.<br>Today, it is mostly famous for allowing animations in an image file, but I am not so interested in that use.

Actually, it turns out this was the only image format ever supported by NCSA Mosaic.<br>Your website must have a GIF fallback for all critical images if it ever wants to truly support old browsers. I&rsquo;m not talking old as in old Chromium.<br>I mean actual Mosaic, Netscape, IE, Netsurf, Dillo, Konqueror - like those you can try on oldweb.today.<br>(You are probably not interested in supporting them, but this is a fun exercise.)

I really wanted the 1-Click Linux website to look acceptable even in the oldest browsers,<br>so I decided that I would use with fallback to GIF.<br>I believe each modern web feature, if used in production, should be reasonably widely compatible on itself already,<br>and then only have one fallback, which should be the one with absolute 1000% compatibility.

Problem#

The problem is, GIF compression is not impressive.<br>Let&rsquo;s face it, in 2026 you should most likely only use SVG and WebP (lossy for photos, lossless for small-pallette drawings/logos).<br>Not PNG, not JPEG, and God forbid AI, DWG or any other proprietary format (looking at you, DICOM).<br>Well, okay, at least on the web (as the name says, WebP).

A partial solution is to use a small image.<br>The truly old devices have truly small screen sizes, like 240x320 Nokia phones.<br>So an icon or logo fallback can safely be 128x128, as 256x256 might not even fit on the screen.

Can we do better? Yes. There is an entire field dedicated to image optimisation, and it starts with stripping metadata.<br>Then we can remove unused colors from the palette, then remove rarely used colors from the palette, and so on.

But none of the steps above mention actual compression itself!

ZopfliPNG#

I heard about zopflipng. PNG uses DEFLATE, the compression format known from ZIP and GZIP. This is a variant of LZ77 with Huffman coding.<br>In this format, and quite commonly in other compression formats, there are many different ways to represent the exact same uncompressed data.

DEFLATE is also the name of an algorithm that generates a reasonably compressed input,<br>and it can be tuned to spend more time compressing in hopes to achieve better compression (for the same data, remember?).<br>But there are many syntactically valid DEFLATE streams that are never produced by DEFLATE the algorithm,<br>which is kind of funny, because you can make a ZIP that contains itself, but anyway,<br>some of them are even better than the largest &lsquo;compression level&rsquo;.

Now, Zopfli is the software that performs an exhaustive search across all possible syntactically valid streams,<br>in order to find what is actually the smallest number of bits to represent the given uncompressed input.<br>Then ZopfliPNG is the variant that does it for PNG and also explores the PNG pixel encodings.<br>We need to be careful here, because finding the actual best compression for an arbitrary format can be equivalent to solving the halting problem.<br>But the compression formats we talk about today have some helpful invariants guaranteeing the search always halts.

ZopfliGIF?#

There is no ZopfliGIF, but there is flexiGIF, which does almost exactly that.<br>It is obviously a wonderful tool, and you should go use it on all your GIFs.<br>But the thing is, GIF uses a very different compression scheme - LZW.<br>And I found a baffling remark in its README, saying that it can leave files larger than the original algorithm.<br>This was very suspicious.<br>So I set out on an exploration. &lsquo;It was made by a single human - therefore I can understand it, too.&rsquo; - thought I.

LZW#

Then I found it a bit difficult to understand, because like all the papers from that time,<br>the original description insists on building the data format around the algorithm.<br>But we, decades later, already know that this is a mistake:<br>for us, interoperability-focused people, the data format is more interesting,<br>of course, than the algorithm.<br>Changing the format requires changing the decoder software.<br>Keep the format, change the encoder software, and enjoy still using the same decoders.<br>Compatibility.

To save you some time, let me describe it to you: compressed stream consists of actions,<br>each either saying &lsquo;yield this byte&rsquo;,<br>or naming a previous action and saying &rsquo;re-do all of this operation again, but then also add the first byte of the action that followed&rsquo;.<br>(The edge case of choosing the last action works just fine, and is called KwKwK in the paper.)<br>(GIF also has an &rsquo;end of data&rsquo; action and a &lsquo;zero the state&rsquo; action, but this is not relevant.)

Simple, right?<br>Well, it does sound simpler to me at least. Way simpler than starting the understanding by reading 9 lines of compression pseudocode,<br>and 9 lines of corresponding decompression pseudocode.

Then the algorithm can be...

format rsquo compression algorithm image data

Related Articles