Agent Privacy

Maintaining Privacy With Agents: What Actually Works When Sensitive Data Is Part of the Workflow | Jack Davis

You are using an outdated browser. Please upgrade your browser to improve your experience.

Maintaining Privacy With Agents: What Actually Works When Sensitive Data Is Part of the WorkflowScope How I evaluated it Finding 1: feature parity is overstated; the real unit is the harness contract Finding 2: the highest-value interception point is usually the transit leg, not the prompt hook Finding 3: a usable privacy layer needs more than allow and block Finding 4: selective routing needs both contextual filtering and deterministic coverage Finding 5: live behavior matters more than isolated hook tests Design implications in agent-privacy Evidence from the current report window Limits What the repository contributes Conclusion

Maintaining Privacy With Agents: What Actually Works When Sensitive Data Is Part of the Workflow

The practical privacy question for agent systems is whether an agent harness exposes reliable control points before model-visible context is assembled, and whether those control points are strong enough to support more than a binary allow-or-block policy. This is important because mixed-sensitivity workflows are common: useful and sensitive content are often intertwined, and a privacy layer that only knows how to stop work is not a usable answer.

The repository I am releasing, agent-privacy, came out of exploring that problem across multiple agent harnesses. Privacy can be improved materially when the system treats information flow as the primary problem, models harness differences explicitly, and routes content through more than one outcome. The design that emerged from that work is built around four actions — allow, redact, handoff, and block — because anything simpler proved either too weak or too disruptive.

Scope

This is not a general survey of AI privacy. It is an exploration of privacy controls for CLI-style agent harnesses that can read files, run shell commands, call tools, and re-inject those results into the next model turn. The central question is where sensitive data can still be intercepted before it becomes model-visible context, and what kinds of control those interception points actually support in live behavior.

This question sits in the middle zone that much privacy advice skips. “Never use sensitive data” and “scrub everything first” are both cleaner than dealing with a prompt that is mostly safe except for one identifier, a tool result that contains the answer plus a customer reference, or shell output that is operationally useful but not clean enough to pass through unchanged. The project treats privacy as an information-flow problem: where does new information enter the harness, what sees it next, and what can still be altered before it reaches the model?

How I evaluated it

The core method was iterative implementation and live testing across harnesses. I wired up prompt hooks, pre-tool hooks, and post-tool hooks, then tested what actually passed through, what could be blocked, what could be replaced, and what only produced advisory behavior. Several of the most important differences only became obvious in live turns, after tool execution, or under failure conditions.

That testing work shaped the implementation itself. It included building the local filter service, evaluating OpenAI Privacy Filter as the contextual PII layer, integrating local Qwen models for fallback or pass-through behavior, and separating the shared privacy logic from the harness-specific adapters. Privacy Filter was a useful model to evaluate in this role because it is built to scan unstructured text, identify sensitive items like names or account numbers, and mask them in one pass, while still using surrounding context to make better decisions than regex alone. It is also designed to run locally. Each time a harness exposed a weaker control surface, an ambiguous contract, or a gap between synthetic and live behavior, the design had to adapt around that reality.

I also used the repository’s own operational artifacts as evidence. The current pii-guard report window covers activity from mid-May through mid-June 2026 and records 6,343 screening decisions, including action counts, event distributions, scanner activity, degraded states, and latency characteristics.

I compared the live behavior of Claude Code, GitHub Copilot CLI, and Codex CLI within this system. The comparison focused on which interception points actually worked for privacy, what payload they exposed, whether they supported denial, replacement, annotation, or handoff, and how those behaviors changed under degradation or unsupported paths.

Finding 1: feature parity is overstated; the real unit is the harness contract

The first large finding is that “supports hooks” is not a meaningful privacy claim by itself. Hooks are general-purpose harness features. They can be used for notifications, logging, workflow control,...

Agent Privacy

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

German ruling declares Google liable for false answers in AI Overviews