Causal graph memory for LLMs. Flat token cost, no matter how the session runs

w4mwati1 pts0 comments

GitHub - raphaelwkago-sketch/rudi: Causal graph memory for LLMs - flat token cost regardless of session length. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

raphaelwkago-sketch

rudi

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>6 Commits<br>6 Commits

.gitignore

.gitignore

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

benchmark_long_haiku.py

benchmark_long_haiku.py

fold.py

fold.py

rudi.py

rudi.py

store.py

store.py

View all files

Repository files navigation

Rudi

Causal graph memory for LLMs. Flat token cost, no matter how long the session runs.

Every LLM API call re-sends the whole conversation. Cost grows every turn; eventually you hit the context limit. Rudi replaces the growing transcript with a dependency graph of decisions — and injects only the slice relevant to the current task. Turn 10,000 costs about the same as turn 10.

The 30-second version

In a 43-turn software-architecture session (building a Notes API turn by turn), the standard "re-send the full transcript" approach was sending ~38,000 input tokens by the final turn. Rudi sent 6,782 — for the same task, same model, same answer quality.

Turn<br>Rudi input<br>Full-transcript input<br>Savings

382<br>340

10<br>1,467<br>6,999<br>4.8×

20<br>3,581<br>17,385<br>4.9×

30<br>4,128<br>26,821<br>6.5×

43<br>6,782<br>38,320<br>5.7×

Totals across all 43 turns: 152,222 input tokens (Rudi) vs 828,369 (full transcript) — 5.4× fewer tokens , and the gap widens every turn because Rudi's curve is bounded while the transcript's is linear.

These numbers are from a run with fold disabled — graph slicing alone. See below for the measured fold result.

Cost of the entire 43-turn run on Claude Haiku 4.5: $0.34.

Fold in action (second run)

At turn 29 of a separate run, fold fired for the first time:

turn 28: input=5,075 tokens active nodes=24<br>[fold] d1–d8 (8 nodes, 20 hard rules) → stub d25<br>[fold] d9–d16 (8 nodes, 20 hard rules) → stub d26<br>[fold] d17–d21 (5 nodes, 16 hard rules) → stub d27<br>turn 29: active nodes=6 (dropped 24 → 6)<br>turn 30: input=2,865 tokens ← down 44% from turn 28

21 live nodes compressed into 3 stubs. 56 hard rules preserved verbatim. Input tokens nearly halved mid-session, automatically. That's the sawtooth: the graph gets smaller as the conversation gets longer.

It doesn't just stay small — it stays correct

Cheap context is worthless if the model forgets the rules. So the same benchmark plants 6 callback traps late in the session and checks whether decisions made dozens of turns earlier are still honored.

Turn<br>Trap<br>Result

38<br>Add logout — must use the exact auth mechanism chosen on turn 1

39<br>Profile endpoint — must scope via turn-1 auth and turn-2 DB

40<br>Admin CSV export — a rule that was folded away banned cross-user data<br>✅ surfaced

41<br>Email full notes — a folded rule banned note contents in email<br>✅ surfaced

42<br>"Store the token in localStorage" — conflicts with turn-1 hard rule<br>✅ blocked

43<br>"Permanently delete a note" — turn-11 chose soft-delete<br>✅ flagged

6 / 6. (First benchmark run — fold disabled, slicing only.) The two that matter most are #3 and #4: those rules had been compressed out of the active context by the time the trap was sprung — and the model still caught them, because hard rules are preserved verbatim on the fold stub. That's the whole thesis: forget the prose, keep the constraints.

How it works

Every model response is parsed into decision nodes , each linked backward to the decisions it depends on:

node = {<br>id, text,<br>depends_on: [...], # backward edges — what this decision rests on<br>hard_rules: [...], # binding constraints; the worker must halt if violated<br>revises, exception_to, # full replacement vs. narrow carve-out<br>status, turn, pinned

Slice, don't dump. Before each turn, Rudi injects only the nodes reachable from the current task — not the transcript.

Fold. When a branch of decisions goes reachability-dead, a background pass compresses it into a one-line stub. Hard rules survive the fold verbatim ,...

turn fold rudi session input nodes

Related Articles