Sandboxes that lie about their secrets

Sandboxes that lie about their secrets - microsandbox Back to Blog

div]:rounded-lg [&>div]:border [&>div]:border-border [&>div]:bg-[var(--card)]">Table of Contents

Picture four scenarios from a single agent product:

Your agent calls OpenAI. The SDK reads $OPENAI_API_KEY from the environment, drops it into an Authorization header, and you want that call to succeed normally.

Mid-session, a prompt injection convinces the agent to POST $OPENAI_API_KEY to a domain you've never heard of. You want that request stopped before the destination learns anything useful about your credential inventory.

Mid-run, the agent's observability SDK streams a session trace (tool calls, headers, request objects) to your trace store. That trace still contains the secret placeholders, and you want the push to succeed without those getting swapped back for the real values on the way out.

In a release sandbox, an install script tries to send $NPM_TOKEN to an unknown host while building artifacts you're about to publish. Stopping the request isn't enough. Once a build runtime tries something like this, you can't trust its outputs anymore.

Four things your sandbox has to get right, and they're all different.

Only the first scenario is really about substitution. When the agent calls an allowed host, secret injection swaps the placeholder for the real credential and the call goes through. The other three are harder, because by then a credential has turned up somewhere it shouldn't and you have to decide what to do about it. Sometimes you drop the request. Sometimes you let the bare placeholder through as harmless data. Sometimes the safest move is to kill the runtime outright. Substitution is where most sandboxes stop, and everything interesting lives past that line.

So injection can't be the whole story. It's one move the boundary can make, and it sits inside something larger: a network policy that has to settle two questions at once, where a credential is allowed to become real, and what happens the rest of the time. The real secret never leaves the host. The guest gets a placeholder, and a secret-aware network boundary decides, request by request, what that placeholder turns into. There are four ways it can go: substitute, pass through, block (and optionally log), or terminate the sandbox. The rest of this post is one section per outcome.

Start with the boundary

Three things matter here: the real secret, the placeholder the guest sees, and the network boundary where policy applies. The real secret stays on the host. The guest VM only ever sees a placeholder string in its environment.

The workload behaves normally. It reads the env var, hands it to an SDK, drops it into an Authorization header, passes it to a CLI. The value just isn't the real credential. The placeholder only matters once outbound traffic reaches a host the secret is allowed to reach.

" class="overflow-x-auto rounded-lg border border-white/[0.08] bg-black px-6 py-5 text-[13px] leading-[1.7] font-mono shiki shiki-themes vesper vesper">import { Sandbox } from "microsandbox";

await using sb = await Sandbox.builder("agent") .image("python") .secret((s) => s .env("OPENAI_API_KEY") .value(process.env.OPENAI_API_KEY!) .allowHost("api.openai.com") .requireTlsIdentity(true) .injectHeaders(true) .injectQuery(false) .injectBody(false), .create();

That's a policy. OPENAI_API_KEY is exposed to the guest as a placeholder. The real value can only be substituted for api.openai.com. TLS identity must be verified first. The injection flags are spelled out here for clarity, but they're also the defaults: headers and Basic auth get substitution, while query params and request bodies stay off unless you opt in.

The placeholder string itself is deterministic. For an env named OPENAI_API_KEY, the guest sees $MSB_OPENAI_API_KEY, not an opaque token like msb_placeholder_8f2e1c. When that value shows up in a log, a stack trace, an error message, or an exported transcript, you can tell at a glance which slot it represents. Traces become self-describing, snapshot tests stay stable across runs, and post-incident review doesn't require decoding a substitution table.

A deterministic placeholder does advertise what credentials this runtime carries. That's exactly what Case 2 below addresses: blocking unknown destinations at the network boundary keeps the inventory private, while deterministic naming inside the runtime keeps it readable.

See the Secrets docs and TypeScript SDK reference for the full surface.

Case 1: substitute for an allowed host

The expected case. Your agent calls OpenAI, GitHub, Stripe, npm, or another service it actually needs. The placeholder appears in outbound traffic, the destination matches the secret's allow list, microsandbox substitutes the real value at the boundary before letting the request continue.

What makes this more than plain substitution is that the destination is part of the policy. The credential only becomes real for a specific host, and...

Sandboxes that lie about their secrets

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy