Others build agent memory, and what I took from each

aryamanagraw2 pts1 comments

How others build agent memory, and what I took from each | Falconer Notes

Back to Notes<br>Starting from zero, every time

An engineer at Falconer asks our agent what’s safe to change to ship a new payments retry path. They’ve owned this code for two years. The agent answers like they’re new to the codebase. It explains what the orchestration layer is, where the idempotency keys live, the basics of the retry queue. None of that was useful.

A different user prefers tight bullet lists for meeting summaries. The agent returns three paragraphs of flowing prose.

Another user hates em dashes in writing. Every draft the agent produces is laced with them.

None of this is a knowledge problem. The agent can look things up. It’s an identity and style problem the agent has no way to solve from a single conversation. By the time the user has corrected the tone or the framing three or four times, the conversation ends and the next one starts cold. Every interaction starts at zero.

That’s the failure mode agent signals was built to fix.

An agent signal is a short, durable, self-stated thing about the user. Their role. Their team. What they own. How they prefer to communicate. How they like things written. The signals get pulled into the system prompt at the start of every agent conversation, so the agent can write with that context baked in. Same questions, different answers. The codebase question gets an answer that assumes you know the codebase. The meeting summary comes back as bullets. The draft doesn’t have em dashes.

That’s the whole feature. The substance is everything underneath: how signals get in, what counts as a signal in the first place, and what to do when there are too many to fit. The first two are about prompt design and ingestion plumbing. The last one is where the design choices get interesting, because it turns out every production AI memory system has solved it differently, and the differences matter.

Two write paths, one read path

There are two write paths for agent signals in Falconer, and one read path that gathers them up at conversation start.

Automatic extraction

Every six hours, a background job scans every conversation flagged for extraction. For each user, it loads the user’s messages from those conversations (assistant turns and tool output are ignored), passes them to a deliberately restrictive LLM prompt, and gets back structured actions: create a new signal, update an existing one, or skip. Returning zero signals from a conversation is the expected outcome, not the exception. The prompt aggressively rejects anything that isn’t a durable self-statement. Task context like “currently debugging payments” gets skipped because it varies across conversations. Org-level facts like “Falconer uses Postgres” get skipped because they apply to everyone in the org. Neither is a signal about who the user is.

Manual entry

Users can add or edit signals directly in a settings page. This path matters more than it sounds. The user is the ground truth about themselves; the extraction LLM is a guess. If the system ever silently overwrites or contradicts something a user typed in, users stop trusting what’s stored about them. So manual signals get treated as sacred. Foreshadowing the tiering design later: they live in their own bucket and the system promises never to consolidate or drop them.

The read path

At the start of every agent conversation, I pull all of the user’s signals and render them as a bullet block under ## About this user in the system prompt. No retrieval step. No semantic search. The full list goes in.

That last part deserves explaining, because the obvious instinct is “this should be RAG.” It shouldn’t. Tens of signals per user, not thousands. The math says inject everything. The model is better at deciding which signals are relevant to the current turn than any retrieval scheme I’d build. This was the first design decision I made, and the reason I made it was that ChatGPT and Claude Code had both arrived at the same conclusion, independently, at much larger scale.

That’s what got me to read everything I could find about how they actually work.

What others have figured out

Three production AI memory systems mattered most to the design: OpenAI’s ChatGPT memory, Anthropic’s Claude Code memory, and the MemGPT / Letta core-memory architecture. Each has a different shape, and each makes a different bet about what’s worth solving in engineering and what’s worth handing to the LLM.

ChatGPT memory

OpenAI’s memory feature stores explicit memories as flat, timestamped one-liners. No categories. No tags. On top of that, ChatGPT periodically generates “User Knowledge Memories”, AI-summarized dense paragraphs about the user that get regenerated when raw memory grows beyond some size. The AI-summarized dossier gets injected into every conversation, alongside any explicit one-liner memories the user has saved. Neither layer relies on retrieval.

The most useful reverse-engineering I read...

user agent memory signals conversation from

Related Articles