Reconstructing SQLite's float-to-text without reading the source

byolopes1 pts0 comments

GitHub - lucasolopes/haruspex · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

lucasolopes

haruspex

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>3 Commits<br>3 Commits

LICENSE

LICENSE

README.md

README.md

reproduce.py

reproduce.py

View all files

Repository files navigation

What changes when you upgrade a critical system? I'll tell you — with numbers.

A real case, reproducible in one line (python reproduce.py, included here), against the<br>real binary — not against theory.

The problem

You're about to upgrade a dependency (or rewrite a legacy system) that runs in critical<br>production. The one question that decides the cutover is simple, and nobody answers it well:<br>what changes in behavior?

Formal verification compares idealized models — it is blind, by design, to what the real<br>binary actually does: floating point, overflow, rounding, environment quirks.

Parallel-run (running old and new side by side) only sees what traffic exercises — the<br>rare case that breaks next month doesn't show up in today's test.

So you flip the cutover in the dark and find the divergence when a customer complains.

What I did

I took a public, deterministic target — SQLite's number renderer (CAST(REAL AS TEXT),<br>which turns a floating-point number into the text you read) — and treated it as a<br>black box : I only ask input → output, without reading the source .

I pointed my tool at it two independent ways. Both converged on the same defect, on their own.<br>And crucially: the tool reports your SQLite version's reality — so it never "fails to<br>reproduce"; it tells you the truth for whatever build you run.

The findings (run reproduce.py to see them for your build)

1 — Round-trip loss (the objective anchor). Take the text SQLite emits, read it back as a<br>number. If the number read back ≠ the original, SQLite lost information. On the<br>15-significant-digit default (SQLite ≤ ~3.51 — I measured 3.50.4, and confirmed the same on a<br>clean 3.46.1), this happens for 93.9% of doubles drawn from a uniform 64-bit sweep (the<br>same 500k sweep used for the reconstruction below) — the least cherry-picked sample there is<br>(every representable double equally likely). Short "nice" decimals round-trip fine; any<br>full-precision or computed double is at risk. It's not opinion — an objective property you<br>check in a loop, and reproduce.py prints the exact rate for your build.

2 — Blind reconstruction. Without reading the source, the tool reconstructed the<br>renderer's behavior and matched it on 99.15% of a 500k sample drawn by a deterministic<br>sweep across the whole 64-bit space (all magnitudes: tiny, huge, subnormal — not a<br>hand-picked range). The small residual (0.85% on 3.50.4 ) is not my reconstruction<br>being wrong — it's SQLite's own dtoa not being correctly rounded at that version (e.g.,<br>where the correct value is …904, SQLite prints …905). This residual varies by version —<br>the tool prints yours. The three versions I ran end to end, same-version pairs:

SQLite version<br>round-trip loss<br>reconstruction residual

3.46.1<br>93.9%<br>0.01%

3.50.4<br>93.9%<br>0.85%

3.51.1<br>93.9%<br>0.85%

Round-trip loss is stable across 15-digit-default builds (it's a function of the digit count<br>— 15 digits simply isn't enough for a double). The residual varies (it's a function of that<br>build's dtoa quality).

Why 99.15% and 93.9% don't contradict each other (the obvious skeptical question): they<br>measure different things. 99.15% is how faithfully I reproduce SQLite's behavior; 93.9% is<br>how much SQLite's behavior loses information. I reproduce, with high fidelity, a behavior<br>that is itself lossy. Two orthogonal axes pointing at the same defect by independent paths.

3 — Exhaustive census (completeness, not a sample). Over all 1,048,576 consecutive<br>doubles starting at 1.0 , the tool hands you the complete list of the ones that diverge —<br>not a sample, a full census of the region. I picked a dense, well-behaved band on purpose<br>so the demo is trivial to verify; the value isn't the band's...

sqlite reproduce search behavior tool version

Related Articles