-->
HTML as a Native Data Format for LLMs | AST-as-HTML - LJ
hack
Skip to main content We built a document platform where an AI assistant designs marketing documents (flyers, brochures, one-pagers) inside brand-approved rails, and humans finish them by clicking into the rendered page and typing. Getting the agent to author templates, the structural layouts those documents are built from, turned out to hinge on a single unfashionable decision:<br>We encode our templates as HTML, not JSON. And the agent's main editing tool is "rewrite the whole thing."<br>That inverts most of the current advice about building agents on structured data. It also turned out to be the cheaper, sturdier choice: fewer tokens burned, faster responses, and better data integrity on every edit. This post is about why.<br>The Problem<br>A document template in our system is a tree: a document contains pages, pages contain blocks, blocks contain text atoms, image atoms, styled containers, and slots that reference reusable widgets. Every node carries attributes: CSS classes, length budgets for copy, type-scale choices, slot constraints. Themes paint the tree through CSS variables, so a template never hardcodes a color; it says bg-primary and the active brand theme decides what that means.<br>We wanted an assistant that could build and restructure these trees conversationally: "Design a full-width widget with a rounded content box on the left and a stat panel on the right." And we wanted its output to land in the same editor, with the same undo semantics and the same validation, as a human's edits.<br>The Obvious Design, and Why We Didn't Ship It<br>The textbook approach is to store the tree as JSON and give the model a granular tool API:<br>insertNode(parentId, type, index)<br>setAttribute(nodeId, key, value)<br>moveNode(nodeId, newParentId, index)<br>removeNode(nodeId)We've built agents like this. They work, but they under-perform in three predictable ways:<br>You're teaching a bespoke schema from scratch. Every node type, every attribute, every containment rule has to be spelled out in the prompt, and the model's only fluency is whatever your prompt bought. It has seen your JSON schema zero times in training.<br>Granular tools invite granular failure. Building a twelve-node layout takes a dozen round trips. Each call can reference a stale id, a wrong parent, an index that shifted two calls ago. The tree passes through eleven intermediate states, each a chance to strand the agent somewhere invalid, and each a state your renderer might have to survive.<br>The model can't "see" its work. With mutation-by-tool-call, the model's picture of the current tree is a mental reconstruction from its own call history. Drift is inevitable.<br>The Inversion<br>Our templates serialize to plain HTML with a small attribute grammar:
Every node is an element. data-type names the node kind, data-name is a stable human label, data-* carries attributes, and a parser converts this to and from the internal AST. This predates our agent work; it existed so templates could round-trip through a human-editable markup view.<br>When we built the template-authoring agent, we made its primary tool embarrassingly blunt:<br>set_template_markup(markup: string, summary: string)<br>// "Replace the ENTIRE template with new markup."That's it. The model reads the current markup (sent fresh with every request, so it always edits what the user actually sees), writes the complete new tree, and the client validates and applies it in one shot.<br>The result surprised us with how little prompting it needed. LLMs have deep, pre-trained fluency in HTML: nesting, attributes, class strings, ids. We didn't teach a format; we borrowed one the model already speaks natively. The prompt spends its budget on our semantics (what a widget-slot means, which theme tokens exist, how length budgets work) instead of on syntax. A grammar digest generated from the same config file the visual editor uses keeps the two from drifting.<br>And because each edit is a whole tree, there are no intermediate states. The edit is coherent or it's rejected, one parse at the boundary:<br>const parsed = parseMarkupToAst(markup);<br>if (parsed?.type !== "block") reject("widgets need a single block root");<br>editor.markupCurrent = format(build(parsed)); // same funnel as human editsEvery agent edit flows through the exact pipeline every human edit uses. Same validation, same undo, same reactive preview. The agent isn't a privileged actor with its own write path; it's just another author.<br>Closed Containers in the Attic<br>Working with JSON is like hunting through closed storage containers in an attic. The labels are inside the lid; you have to open each container to learn what it is, and because JSON nests, you're opening containers inside containers inside containers. The containers only make sense if you brought the packing list: without knowing the key names and schema in advance, you can't even ask the right questions. And when you're done rummaging, the way out is a run of identical...