Hellishly Slow Level 13 DEFLATE Compression - Kirill's journalHellishly Slow Level 13 DEFLATE Compression<br>Kirill A. Korinsky at Kirill's journal<br>May 12, 2026<br>IT Research Notes
Abstract<br>DEFLATE level 13 is a deliberately impractical libdeflate compression<br>level: the output remains standard DEFLATE, while the encoder spends far<br>more time searching parse, Huffman, and block split choices. On Silesia<br>it saves 86'990 bytes, 0.134%, over level 12 and runs 56.4x slower; this<br>cost is acceptable only when data is compressed once and distributed<br>many times.
Why DEFLATE<br>DEFLATE remains worth optimising because its decoders are already<br>everywhere: HTTP content encoding, ZIP archives, PNG internals, software<br>distribution, backup tools, and embedded formats still use the same LZ77<br>plus Huffman design. The decoder contract is fixed, but the encoder<br>still chooses matches, block boundaries, and Huffman tables; better<br>choices improve size without changing compatibility.<br>The baseline is libdeflate<br>level 12, one of the strongest practical DEFLATE encoders. The level 13<br>implementation was contributed upstream as<br>pull request.<br>Level 13 Mechanics<br>Level 13 keeps libdeflate’s near optimal parser, but makes its choices<br>more expensive. It searches the full 32 KiB DEFLATE window, permits 15<br>optimisation passes, and applies static Huffman optimisation to blocks<br>up to 50'000 input bytes. No format extension is involved.<br>For text like data, level 13 delays block size commitment. It samples up<br>to 64 KiB from the current block start; if the sample contains no NULL<br>byte and at most 97 distinct byte values, the soft block size rises from<br>300'000 to 1'000'000 bytes. The assumption is simple: a stable byte<br>distribution lets one Huffman table cover more data.<br>The parser broadens the minimum cost search. It can choose the cheapest<br>offset for each match length, estimate initial Huffman costs from<br>literal and match length statistics, estimate offset slot frequencies<br>from cached matches, and compare measured dynamic Huffman cost against<br>static Huffman and literal only encodings.<br>Block splitting is also delayed. The compressor stores up to nine split<br>candidates with parser state, then scores the full block and a bounded<br>shortest path over candidate intervals. A single split can win on cost;<br>a multi split path must beat the full block by at least 512 bits. The<br>selected parse is cached for final flushing.<br>The slowness is bounded. Search passes, split candidates, and block<br>sizes are capped, so level 13 cannot enter an unbounded optimisation<br>loop like<br>turtledeflate or<br>broader file optimizers such as<br>ECT.<br>Regression Policy<br>Development used a zero compression regression policy against<br>the Silesia corpus:<br>many approaches were tried, but only changes that strictly decreased at<br>least one compressed file, without increasing any other compressed file,<br>survived into the final level 13 configuration.<br>The Silesia corpus is small enough for repeated development, but mixed<br>enough to punish single file tuning: it contains text, binaries,<br>databases, images, and structured data.<br>Silesia Results<br>libdeflate level 12 versus level 13 on Silesiafilelevel 12 size / timelevel 13 size / timesize difftime diffdickens3'688'552 / 1'289 ms3'684'671 / 83'512 ms-3'881 (-0.105%)+82'223 (+6378.8%)mozilla18'267'490 / 4'959 ms18'235'120 / 110'754 ms-32'370 (-0.177%)+105'795 (+2133.4%)mr3'448'571 / 1'627 ms3'443'723 / 16'260 ms-4'848 (-0.141%)+14'633 (+899.4%)nci2'766'224 / 7'935 ms2'758'044 / 673'648 ms-8'180 (-0.296%)+665'713 (+8389.6%)ooffice2'998'130 / 424 ms2'995'604 / 8'676 ms-2'526 (-0.084%)+8'252 (+1946.2%)osdb3'642'347 / 798 ms3'641'341 / 4'942 ms-1'006 (-0.028%)+4'144 (+519.3%)reymont1'702'796 / 1'005 ms1'699'494 / 81'839 ms-3'302 (-0.194%)+80'834 (+8043.2%)samba5'135'662 / 2'889 ms5'122'812 / 141'227 ms-12'850 (-0.250%)+138'338 (+4788.4%)sao5'255'575 / 333 ms5'255'358 / 1'687 ms-217 (-0.004%)+1'354 (+406.6%)webster11'565'754 / 6'452 ms11'555'293 / 475'196 ms-10'461 (-0.090%)+468'744 (+7265.1%)x-ray5'754'248 / 305 ms5'748'141 / 3'276 ms-6'107 (-0.106%)+2'971 (+974.1%)xml633'760 / 1'104 ms632'518 / 69'504 ms-1'242 (-0.196%)+68'400 (+6195.7%)total64'859'109 / 29'120 ms64'772'119 / 1'670'521 ms-86'990 (-0.134%)+1'641'401 (+5636.7%)Level 13 saves 86'990 bytes across the corpus, a 0.134% reduction, and<br>adds 1'641'401 ms of runtime. The strongest relative result is nci at<br>0.296%; sao changes by only 0.004%.