Show HN: Argus – Capture, replay and QA every Claude Code session your team runs

Alpha is open · sign in to start Is Claude Cowork actually working for your users and clients? Argus captures every session of your users across your team or organization, then reads across thousands of them at once — surfacing where skills are holding, where they're quietly breaking, and which patterns are pointing at the next thing you should build. Sign in to Argus Read a sample session Magic-link sign in. No password. Free during alpha.

arguslab.co/northwind/dashboard AArgus/Nnorthwind Sessions captured 142+12%

Cost $58.41+8%

Tokens 18.4M+14%

Active users 8flat

Errors 3−47%

SessionsErrors

Spot checks Cache hit rate86% Avg session cost$0.41 P50 latency78 ms Tools / session14.2

Recent sessions workspace

Refactor the Stripe webhook handler to retry on idempotency conflicts…runningsara@northwind.io$0.84

Sync Linear backlog with the GitHub release milestone…completedjames@northwind.io$1.42

Draft the weekly customer-success digest from HubSpot + Slack threads…flaggedpriya@northwind.io$4.21

Live preview · Argus dashboard, one workspace

The problem Once it ships, you go blind. Claude Cowork's built-in telemetry tells you a skill was invoked. It can't tell you whether it worked, whether the user took the answer, whether you should ship a fix tomorrow.

№ 01The counter that says nothing. Cowork's built-in telemetry logs your skill as Skill: 3. Three invocations. Across 412 sessions for one client this month, you have a single number per skill — invocation count. Nothing about which versions ran, what they returned, whether the user accepted the answer or had to ask twice. observed412 sessions surfaced1 aggregate counter

№ 02The skill that broke quietly. You shipped weekly-review four weeks ago. Across the first 38 sessions, users accepted the answer on the first turn. Across the next 9, they re-asked, rephrased, switched tools. Something started failing on session 39. Nobody noticed — the cost line stayed flat and no single session looked broken on its own. skillweekly-review · v1.2.0 patternfirst-turn 96% → 22%

№ 03The skill that wasn't there yet. Across the team's traffic this month, “turn this Linear ticket into a release-notes entry” came up fourteen times in eight different phrasings. Each one got a different ad-hoc answer; one user gave up. A skill is waiting to be written there. No telemetry surface — yours or anyone else's — will ever find it. pattern“linear → release notes” sessions14 · 8 phrasings · no skill

The instrument What the counters can't see. Argus runs as a Claude Cowork plugin. It captures every session — prompts, assistant responses, every tool call, every follow-up — in plain text, stitched back into the conversation the user actually had. The qualitative layer that makes everything else possible.

№ 01Did the user accept the answer? A skill that works ends the conversation. A skill that doesn't gets re-prompted, rephrased, abandoned. Argus captures the user's exact follow-ups so the Agent can see, at a glance across hundreds of sessions, which versions of which skill are landing on the first turn — and which aren't. Captured · used in: first-turn acceptance, follow-up patterns

№ 02Did the assistant ask the user to do its job? Skills should answer questions, not ask new ones. When a skill is under-specified, the assistant stalls — “could you clarify…”, “which one did you mean…” — and the user does the work the skill was meant to do. Argus captures the stalls so the Agent can show where they cluster. Captured · used in: stall frequency, under-specified prompts

№ 03What did the tool actually return? The MCP call succeeded. Status 200. But the payload was empty, or a 400-row dump, or a JSON that didn't match what the skill asked for. Argus captures the tool's plain-text output beside the assistant's response, so the Agent can flag the sessions where the skill kept going on bad input. Captured · used in: tool-output mismatches, silent failures

№ 04What did users ask for that no skill could handle? The prompts your customisations don't yet cover. Same export, same lookup, same wrangle — captured verbatim, even when nothing answered them. The Agent reads across the unmet-prompts corpus and surfaces patterns ready to become the next skill. Captured · used in: unmet-prompt clustering, skill candidates

The loop From every session, a catalogue that improves itself. Five moves, read bottom-up — raw work at the foundation, refined knowledge on top. Each layer rests on the one beneath it; the loop settles new and refined skills back into the next session.

Refined knowledge

Raw work

№ 04 Layer 5

Refine Rate the work; an agent refines weak skills and drafts the ones your usage is asking for.

Agent · plannedrate · refine · draft

№ 03 Layer 4

Review Replay grouped by skill; analyse every invocation across the org.

Org catalogue SkillsMCPsAgentsPlugins

№ 02 Layer 3

Clean & structure Redacted and tenant-isolated, then each session is rebuilt into a complete,...

Show HN: Argus – Capture, replay and QA every Claude Code session your team runs

Related Articles

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7