LLM as a Web Server

LLM as a Web Server - jin codes LLM as a Web Server<br>July 3, 2026 · 8 min read<br>At June’s Prompt Poets Society meetup, two lines from JD Trask’s talk stuck with me.

Web servers are text in, text out.

And, a bit later:

LLMs are text in, text out.

So what happens if the LLM is the web server? Not a model writing code for a web server, but the model serving every new page itself, fresh HTML invented on the spot.

It’s a terrible idea for all the obvious reasons: every click costs real money, takes seconds instead of milliseconds, and comes back a little different every time. But I was curious, so I built it.

The experiment is called 🔥 token burner 🔥: a web app where the LLM is effectively the view layer.

Instead of talking to a model through a chat box and getting text back, you talk to it through a web page and get another web page back. Buttons, forms, charts, games, controls, and layouts are all just different shapes a conversation can take. Clicking a button isn’t “using the app” in the normal sense. It’s sending the next message.

So it’s not a vibe coding tool, even if it looks like one on the surface. The generated HTML is not the end product. It is the conversation.

Here’s what that looks like in practice. You seed a conversation with a plain-text prompt:

And the reply comes back as a page, not a paragraph:

Not everything on the page costs a new turn. Selecting a planet here expands its details with ordinary client-side JS. But a button like “Deep Dive into Earth” sends the next message in the conversation, and the model renders a fresh page in response. The model draws that line itself while writing each page: what it can handle with the content already on it, and what deserves a whole new turn.

From that one explorer page I drilled into Mars, then Jupiter, then Earth. Three replies, three siblings branching off the same parent turn:

Nobody designed these pages. Each one is a fresh generation, and the model gave each branch its own art direction. Mars went ochre, Jupiter went regal gold, and Earth got the friendly blue treatment. Those links are the live turns, so you can pick up this exact conversation and branch it somewhere I never went.

How a request actually becomes a page

The mechanics are simpler than they sound. Not every interaction round-trips to the model. Plenty of pages have ordinary client-side JS for animations or local UI state, but any element the model wants to make a new turn carries a data-prompt attribute (or a data-prompt-template for forms), and a small injected script turns clicks and submits on those elements into a POST to /c/:convoId/:turnId/act with that prompt as the body. A generated button is literally the next message, waiting to be sent:

Explore Mars →

On the server, three things happen:

Load history. A recursive SQL query walks up the conversation tree via parent_turn_id, so branched conversations don’t leak sibling turns into context.

Call the model with a forced tool. The new prompt gets appended, and the model is called with toolChoice locked to a single tool, render_page({ body_html, summary, style_hints }). It has no choice but to hand back structured HTML.

Render and respond. The body_html gets dropped into a shared template (Tailwind, D3, Chart.js, Three.js, Tone.js all preloaded) and served back as the actual HTTP response for GET /c/:convoId/:turnId.

The one detail that made the whole thing click for me is that the model never sees the raw HTML it generated on previous turns. Only the summary string it writes about itself each turn gets replayed as assistant history. Context stays small. The model is deliberately wasteful with output tokens (hence the name) but frugal about what gets fed back in.

The parts that needed real engineering

Once you’re not just doing a single fun completion, the toy version stops being enough:

Generation is async. A turn gets inserted as status: "pending", the LLM call runs in the background, and the page polls a /status endpoint every second or so, showing a spinner until it’s done.

Branching and dedup. Because turns are a tree keyed by parent_turn_id, the same prompt fired from the same parent turn reuses the existing generated page instead of burning a fresh generation.

The system prompt has to say what it can’t fake. I explicitly forbid the model from faking “backend functionality” client-side. Anything that needs real logic has to round-trip through another turn instead.

Navigation is the browser’s job. The model is banned from rendering back buttons. A generated “back” button can’t actually return anywhere; it would just burn a fresh turn imitating an earlier page. The browser’s real back button already works, and walking back up the tree to branch off somewhere new is half the point.

(And yes, since conversation links are shareable, opening someone else’s branch means running whatever JS their prompts talked the model into writing. It’s a toy; treat it like one.)

Is it actually usable? More...

LLM as a Web Server

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI