HN: SteelSpine replay tool for debugging AI agents

SteelSpine AI™ — Debug any AI agent. Capture every run. Replay any state.

Debug AI agents · Capture · Compare · Replay

Wrap any agent in one command.

Capture every event. Compare runs. Replay from any state.

Cryptographically signed end-to-end — EU AI Act Article 12 ready out of the box.

Scroll to continue ↓

Your AI ran.

Something went wrong.

You have no idea why.

01 Run it once — SteelSpine AI records every decision your agent makes as a structured causal event. No code changes.

02 Run it twice — SteelSpine AI shows you the exact moment the two runs diverged and exactly what caused it.

03 Verify it — Every run gets a cryptographic audit trail. Tamper with a single event — detected instantly.

That's it. One command. Full history. Proof it wasn't touched.

No vendor lock-in. Runs locally. Works with anything.

Show me more → Just get it

EU AI Act Article 12 enforcement starts August 2, 2026 · make your AI auditable in one command — how →

Debug · Capture · Compare · Replay · Signed end-to-end · EU AI Act Art.12 Ready

Why did your agent

do that?

Your agent failed. You have logs. You still don't know why —

and it won't remember any of it next session.

SteelSpine AI fixes both. Zero code changes.

Start 14-Day Free Trial How it works →

CA$29.99/mo after trial · Cancel anytime · No vendor lock-in

63% of 100-step agent tasks fail at 99% per-step accuracy¹

46% of developers don't trust what their AI outputs²

32% output quality is the #1 blocker to production²

no known tool combines replay + proof + memory

¹ Vellum / Towards Data Science, 2025 · ² Stack Overflow Developer Survey, 49,000 respondents, 2025 · ³ Gartner, 2025

The Short Version

The problem is real. The gap is wide. No known tool closes it.

Other tools give you traces. A trace shows you what happened — it doesn't let you replay it, prove it, or remember it next session. SteelSpine AI does all three.

01 Debug

At 99% accuracy per step, a 100-step agent still fails 63% of the time — the math compounds. When it fails, your logs say "completed successfully." SteelSpine AI shows you the exact event where it went wrong and why, then lets you replay it deterministically from that point. Every failure is permanently recorded — find it days, months, or years later.

Zero code changes

02 Remember

Every LLM call starts from zero. No memory of last session, no entity context, no continuity. Gartner found a 20% customer churn increase when agents lose session context³ — and stuffing more context past 100k tokens doubles inference time and quadruples cost. Change one URL and SteelSpine AI injects persistent memory into every request — no framework changes, ever.

One URL change

03 Prove

LangSmith, Galileo, Arize — they all give you traces. Traces show you what happened. They cannot prove nothing was changed. SteelSpine AI's SHA-256 rolling hash chain detects any edit, deletion, or insertion to any event, past or present. Cryptographically.

Patents Pending

See It In Action

Add it to any agent in 30 seconds.

View terminal demo on asciinema.org →

↑ A real refund-bot run. Watch SteelSpine catch the policy violation in real time.

Or read it as a sequence:

steelspine — agent session

# Wrap your agent — nothing else changes

$ steelspine run python my_agent.py

✓ Run captured: run_0047 | 312 events | 4.2s

✓ Verdict: SUCCEEDED — hash chain clean

Divergence detected vs run_0046 — auto-compare running

# Find out exactly where two runs split

$ steelspine compare

↳ Divergence at event 187: param "query" changed

↳ 3 downstream decisions invalidated — root cause isolated

# Cryptographic proof of what your AI decided

$ steelspine verify-run

✓ SHA-256 chain: CLEAN | 312/312 events verified | Audit ready

Beyond Capture

Infrastructure for AI agents. Not a logging library.

The capture-and-audit demo above is the first 10% of what SteelSpine does. Underneath the CLI is a five-layer infrastructure stack — every piece runs locally, no cloud dependency, no vendor lock-in.

Layer 1

Capture & Replay

Wrap any agent or command. Stream stdout/stderr to a hash-chained event log. Replay offline against any captured state.

steelspine run · replay-run · branch-create

Layer 2

Cryptographic Audit

HMAC-SHA256 + Ed25519 chain. Tamper-evident. Independently verifiable by an auditor with just the public key. EU AI Act Article 12 compliant out of the box. Optional hardening: compliance_mode auto-enables RFC 3161 timestamping via eIDAS-accredited TSA; --pq-sign adds ML-DSA-65 post-quantum signatures (NIST FIPS 204) for long-archive audits.

verify-run · pack-create · pack-verify

Layer 3

Persistent Memory

Transparent proxy in front of any OpenAI-compatible LLM. Auto-injects relevant context into every prompt. Promotes durable facts to long-term entity store. The same agent remembers across sessions.

memory-agent · memory recall · entities

Layer 4

Adapters & Ingress

OpenTelemetry receiver for LangChain & OTel agents. Filesystem-drop,...

HN: SteelSpine replay tool for debugging AI agents

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast