Reconstructing SQLite's float-to-text without reading the source

GitHub - lucasolopes/haruspex · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

lucasolopes

haruspex

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 3 Commits 3 Commits

LICENSE

README.md

reproduce.py

View all files

Repository files navigation

What changes when you upgrade a critical system? I'll tell you — with numbers.

A real case, reproducible in one line (python reproduce.py, included here), against the real binary — not against theory.

The problem

You're about to upgrade a dependency (or rewrite a legacy system) that runs in critical production. The one question that decides the cutover is simple, and nobody answers it well: what changes in behavior?

Formal verification compares idealized models — it is blind, by design, to what the real binary actually does: floating point, overflow, rounding, environment quirks.

Parallel-run (running old and new side by side) only sees what traffic exercises — the rare case that breaks next month doesn't show up in today's test.

So you flip the cutover in the dark and find the divergence when a customer complains.

What I did

I took a public, deterministic target — SQLite's number renderer (CAST(REAL AS TEXT), which turns a floating-point number into the text you read) — and treated it as a black box : I only ask input → output, without reading the source .

I pointed my tool at it two independent ways. Both converged on the same defect, on their own. And crucially: the tool reports your SQLite version's reality — so it never "fails to reproduce"; it tells you the truth for whatever build you run.

The findings (run reproduce.py to see them for your build)

1 — Round-trip loss (the objective anchor). Take the text SQLite emits, read it back as a number. If the number read back ≠ the original, SQLite lost information. On the 15-significant-digit default (SQLite ≤ ~3.51 — I measured 3.50.4, and confirmed the same on a clean 3.46.1), this happens for 93.9% of doubles drawn from a uniform 64-bit sweep (the same 500k sweep used for the reconstruction below) — the least cherry-picked sample there is (every representable double equally likely). Short "nice" decimals round-trip fine; any full-precision or computed double is at risk. It's not opinion — an objective property you check in a loop, and reproduce.py prints the exact rate for your build.

2 — Blind reconstruction. Without reading the source, the tool reconstructed the renderer's behavior and matched it on 99.15% of a 500k sample drawn by a deterministic sweep across the whole 64-bit space (all magnitudes: tiny, huge, subnormal — not a hand-picked range). The small residual (0.85% on 3.50.4 ) is not my reconstruction being wrong — it's SQLite's own dtoa not being correctly rounded at that version (e.g., where the correct value is …904, SQLite prints …905). This residual varies by version — the tool prints yours. The three versions I ran end to end, same-version pairs:

SQLite version round-trip loss reconstruction residual

3.46.1 93.9% 0.01%

3.50.4 93.9% 0.85%

3.51.1 93.9% 0.85%

Round-trip loss is stable across 15-digit-default builds (it's a function of the digit count — 15 digits simply isn't enough for a double). The residual varies (it's a function of that build's dtoa quality).

Why 99.15% and 93.9% don't contradict each other (the obvious skeptical question): they measure different things. 99.15% is how faithfully I reproduce SQLite's behavior; 93.9% is how much SQLite's behavior loses information. I reproduce, with high fidelity, a behavior that is itself lossy. Two orthogonal axes pointing at the same defect by independent paths.

3 — Exhaustive census (completeness, not a sample). Over all 1,048,576 consecutive doubles starting at 1.0 , the tool hands you the complete list of the ones that diverge — not a sample, a full census of the region. I picked a dense, well-behaved band on purpose so the demo is trivial to verify; the value isn't the band's...

Reconstructing SQLite's float-to-text without reading the source

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI