Terminal Apps Need a DOM

Terminal Apps need a DOM | C1Ship AI without shipping risk.Ship AI without shipping risk.Meet AI Access Management

Sign inBook a demo

c1@engineering ~ $ cd /engineering && cat ./agent-tui-structured-terminal-access-for-ai-agents.md AIDeveloper Tools > Terminal Apps need a DOM Paul Querna |2026-06-30|9 min read share:summarize:

width:

When we were building Squire, C1's software factory, we hit a slightly absurd problem: the AI tools were also built for humans.

Squire could give an agent work. But Claude Code, Codex, Pi, and similar AI harnesses present themselves as terminal apps first. Their live interface is a TUI made for a person: a prompt, a streaming response, approval screens, file-change panes, and a cursor waiting for the next instruction. Another agent can type into that interface. It still needs to know whether the response is done, whether an approval screen appeared, or whether the cursor has returned to the prompt.

That is the problem agent-tui solves. It runs the target program on a PTY, keeps the terminal state alive in a daemon, exposes the rendered screen as text or an outline with stable refs, and lets a client snapshot, press keys, and wait for named screen state. It gives terminal apps the same kind of queryable surface that made browser automation useful.

agent-tui is open source. We are publishing it under the Apache-2.0 license at github.com/ConductorOne/agent-tui. The design comes from our experience with agent-browser: give the agent something it can query instead of a pile of pixels. agent-tui applies that idea to terminal apps.

One common Squire pattern is an orchestration agent: a coding harness receives the assignment, then drives another harness to do the work. In this demo, OpenAI's Codex uses agent-tui to drive the Pi harness through a real terminal session.

Agents driving agents#

agent-tui starts the outer Codex TUI, waits for @codex.input, types the task, and presses enter. Codex then runs the command sequence below, which starts a second agent-tui daemon around Pi.

The Pi side is just another agent-tui session:

agent-tui daemon run agent-tui spawn -- pi --offline --no-extensions --no-context-files --no-skills agent-tui wait --ref '@pi.input' agent-tui type --to '@pi.input' 'Reply with the token formed by joining INNER, AGENT, and OK with `_`.' agent-tui press --to '@pi.input' '' agent-tui snapshot --mode text

The result is two live screens: one where Codex receives the task, and one where Pi answers inside the nested session.

Vercel's AI SDK harnesses frame agent CLIs as provider-specific surfaces, not one generic wrapper. agent-tui takes the same approach to terminal screens: keep each app's shape, then expose the parts an agent can query.

Why terminal apps need structure#

Most useful terminal programs were built for humans, not machines.

htop, vim, lazygit, psql, language REPLs, and newer agent CLIs such as Claude Code and Codex have different jobs. They share one automation problem: the live interface is a terminal session. It owns a PTY, redraws a grid of cells, and expects a person to infer what changed. Some tools expose batch modes. Many do not. Even when a batch mode exists, it is often a different interface from the human session you need to observe, interrupt, or steer.

An agent can write bytes to a terminal easily. The hard part is knowing what happened after those bytes landed. A full-screen program may repaint in place, move the cursor, enter an alternate screen, update one field, and never print a clean line that says "ready."

The usual choices make bad contracts with the terminal. Escape-sequence parsing treats the byte stream as the API. Rendered-text scraping throws away state. Sleeping between keystrokes punts the problem to the scheduler, which means the script works until CI is slow or one prompt lands in a state you did not match.

Give the terminal a DOM#

Vim is a good stress test: it is a full-screen editor, not a command that prints lines. Here agent-tui drives a real Vim session through refs instead of sleeps.

In the recording, the left pane issues agent-tui commands and the right pane is the Vim PTY. The driver waits for the buffer, reads the mode, enters insert mode, writes hello world, saves hello-world.txt, and checks the file contents.

Raw terminal text is a bad handle for this job. agent-tui exposes an outline: a tree of screen regions with roles and stable refs. A ref is a name for something on the screen. It lets an agent say "the Vim mode indicator" instead of "row 24, column 1."

agent-tui spawn -- vim notes.md agent-tui wait --ref '@vim.buffer' # vim has rendered agent-tui --json snapshot --select '@vim.mode' | jq -c '.data.outline.nodes[0]'

{"durable":true,"ref":"@vim.mode","role":"mode","value":"normal"}

@vim.mode is durable. It names the same part of the screen whether the value is normal, insert, or something else. Refs can be queried with a small selector language: [role=buffer][focused],...

Terminal Apps Need a DOM

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level