GitHub - lucasolopes/haruspex · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
lucasolopes
haruspex
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>3 Commits<br>3 Commits
LICENSE
LICENSE
README.md
README.md
reproduce.py
reproduce.py
View all files
Repository files navigation
What changes when you upgrade a critical system? I'll tell you — with numbers.
A real case, reproducible in one line (python reproduce.py, included here), against the<br>real binary — not against theory.
The problem
You're about to upgrade a dependency (or rewrite a legacy system) that runs in critical<br>production. The one question that decides the cutover is simple, and nobody answers it well:<br>what changes in behavior?
Formal verification compares idealized models — it is blind, by design, to what the real<br>binary actually does: floating point, overflow, rounding, environment quirks.
Parallel-run (running old and new side by side) only sees what traffic exercises — the<br>rare case that breaks next month doesn't show up in today's test.
So you flip the cutover in the dark and find the divergence when a customer complains.
What I did
I took a public, deterministic target — SQLite's number renderer (CAST(REAL AS TEXT),<br>which turns a floating-point number into the text you read) — and treated it as a<br>black box : I only ask input → output, without reading the source .
I pointed my tool at it two independent ways. Both converged on the same defect, on their own.<br>And crucially: the tool reports your SQLite version's reality — so it never "fails to<br>reproduce"; it tells you the truth for whatever build you run.
The findings (run reproduce.py to see them for your build)
1 — Round-trip loss (the objective anchor). Take the text SQLite emits, read it back as a<br>number. If the number read back ≠ the original, SQLite lost information. On the<br>15-significant-digit default (SQLite ≤ ~3.51 — I measured 3.50.4, and confirmed the same on a<br>clean 3.46.1), this happens for 93.9% of doubles drawn from a uniform 64-bit sweep (the<br>same 500k sweep used for the reconstruction below) — the least cherry-picked sample there is<br>(every representable double equally likely). Short "nice" decimals round-trip fine; any<br>full-precision or computed double is at risk. It's not opinion — an objective property you<br>check in a loop, and reproduce.py prints the exact rate for your build.
2 — Blind reconstruction. Without reading the source, the tool reconstructed the<br>renderer's behavior and matched it on 99.15% of a 500k sample drawn by a deterministic<br>sweep across the whole 64-bit space (all magnitudes: tiny, huge, subnormal — not a<br>hand-picked range). The small residual (0.85% on 3.50.4 ) is not my reconstruction<br>being wrong — it's SQLite's own dtoa not being correctly rounded at that version (e.g.,<br>where the correct value is …904, SQLite prints …905). This residual varies by version —<br>the tool prints yours. The three versions I ran end to end, same-version pairs:
SQLite version<br>round-trip loss<br>reconstruction residual
3.46.1<br>93.9%<br>0.01%
3.50.4<br>93.9%<br>0.85%
3.51.1<br>93.9%<br>0.85%
Round-trip loss is stable across 15-digit-default builds (it's a function of the digit count<br>— 15 digits simply isn't enough for a double). The residual varies (it's a function of that<br>build's dtoa quality).
Why 99.15% and 93.9% don't contradict each other (the obvious skeptical question): they<br>measure different things. 99.15% is how faithfully I reproduce SQLite's behavior; 93.9% is<br>how much SQLite's behavior loses information. I reproduce, with high fidelity, a behavior<br>that is itself lossy. Two orthogonal axes pointing at the same defect by independent paths.
3 — Exhaustive census (completeness, not a sample). Over all 1,048,576 consecutive<br>doubles starting at 1.0 , the tool hands you the complete list of the ones that diverge —<br>not a sample, a full census of the region. I picked a dense, well-behaved band on purpose<br>so the demo is trivial to verify; the value isn't the band's...