Let’s talk about encrypted reasoning – A Few Thoughts on Cryptographic Engineering
Skip to content
Home
Menu
Let’s talk about encrypted reasoning
Matthew Green<br>in AI, attacks
May 29, 2026May 29, 2026
3,141 Words
This is a quick post I wanted to write about a hobby project I spent a weekend on. It has little to do with real cryptography, and mostly doesn’t expose a particularly exciting vulnerability. But it did teach me a lot about frontier LLM APIs and coding agents. It also got me certified as an OpenAI "cyber researcher" which is something that doesn’t happen every day.
In any case, please keep your expectations low. Who knows, perhaps someone else will find something exciting to do with this.
What’s encrypted reasoning?
Last week I decided it’d be fun to set up an OpenClaw agent. I still don’t know why I did this. I have no use for another AI in my life, and I realized this fact almost immediately after I got through the (surprisingly difficult!) configuration process. But configuring the agent to talk to Claude exposed me to something way more interesting: I got a cool error. The kind of error that cryptographers can’t resist:
Screenshot<br>" data-large-file="https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg?w=700" width="1024" height="812" src="https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg?w=1024" alt="" class="wp-image-9058" style="aspect-ratio:1.2610719573455782;width:245px;height:auto" srcset="https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg?w=1024 1024w, https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg?w=150 150w, https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg?w=300 300w, https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg?w=768 768w, https://blog.cryptographyengineering.com/wp-content/uploads/2026/05/img_5689.jpg 1320w" sizes="(max-width: 1024px) 100vw, 1024px" />
This intrigued me. What in the world was a signature doing in an LLM’s "thinking" block? Why would thinking blocks be signed in the first place? And if the thinking blocks are signed, then that means tampering with thinking blocks must have security implications. And there went my weekend.
After twenty hours and about 5 million Codex tokens, I wasn’t much smarter. But I had learned a few things.
First, the basics. You probably know that most LLM providers expose an API so you can write apps that talk to the model. For Claude, this is called the Messages API, while OpenAI calls it Responses. These APIs handle the ordinary tasks you’d expect an application to need from an LLM. They (1) allow you to set an application-level "instructions" (or ‘developer’) prompt for your application. They let you (2) provide ordinary textual prompts, and get back responses from the LLM. They also (3) provide bookkeeping, for example, listing the number of tokens you’ve used.
For reasoning LLMs, they also do something I did not previously know about, and this is central to the error message above. They also send you the contents of the model’s hidden "reasoning" or "thinking" fields. Note that this data is not the stuff you see on ChatGPT when you ask it a question: those strings are merely summaries. The model’s actual reasoning (called "chain-of-thought", CoT) is normally kept private and held back by the server.
However, the APIs work differently: for various reasons (which we’ll get into below), an encrypted copy of the raw CoT reasoning data is actually sent down to the application.
If you’re like me, you should now have three questions: how, why, and so what?
The how is the easiest to answer: for both providers, "thinking"/"reasoning" are sent down to the client as JSON. Each contains a blob of Base64-encoded stuff. The API documentation informs us that this data contains opaque reasoning, and that you’re not meant to look at it; you’re just supposed to ship it back to the server on the next turn.
Let’s break that rule.
The content of the blocks varies slightly between providers, but the core of each is a random-looking string that appears to be an authenticated ciphertext. You don’t need to be Sherlock Holmes to deduce this. First, it grows and shrinks depending on how hard the model thinks. And second, tampering with any of the ciphertext-looking data produces a recognizable API error when you send it back in.
Thanks to AI, I can make nice diagrams. Here’s what OpenAI’s reasoning blocks look like:
This GPT 5.5 diagram is partly a guess. This assumes they’re based on the Fernet token standard.
And here’s Anthropic’s wildly overcomplicated equivalent:
Although it’s called a "signature", there appears to be no actual signature here (I ran a bunch of tests on that 64-byte field.) The various opaque fields all mutually authenticate: you can’t change any of them or swap for fields from other blocks, but you can mess with everything else. The...