Building domain-specific AI chatbots with Claude

Building domain-specific AI chatbots: JazzQuery, RennQuery, WristQuery, and Celiac vs Me — Greg Wilson's Tech Blog Skip to content ← All posts I keep building the same thing: a chatbot that’s a genuine expert in exactly one niche and politely useless outside it. Ask it anything in its lane and it’s sharp, opinionated, and sourced. Ask it to write you a Python script or summarize the news and it declines and steers you back.

There are four of them now:

JazzQuery — jazz: musicians, sessions, sidemen, labels, discographies.

RennQuery — Porsche: models, generations, specs, motorsport, the people.

WristQuery — watches: brands, references, movements, complications, collecting.

Celiac vs Me — a chat companion for living with celiac disease, grounded in federal medical references and my own book of the same name.

They look like four different products. Under the hood they’re the same machine with the paint swapped. This post is that machine: the web architecture, the two-model trick that keeps the bills small, the curated-source approach that keeps the answers honest, and the per-site knowledge bases that make a general-purpose model behave like a specialist — including a local mirror of a chunk of Wikipedia, a 33-million-row jazz discography, and a full book manuscript loaded into a search index.

As always, these are evening-and-weekend projects squeezed around a day job — which is only realistic because I don’t write the code by hand anymore. I do the architecture, the judgment calls, and the data curation; Claude Code does the building. Running four of these without it becoming a second job — or, so far, a real expense — is the whole point of the design. (And if you want to spin up your own, there’s a build kit at the end.)

The four sites at a glance

SiteThe hard partJazzQuery jazz”Who played drums on this session, and did these three ever record together?” — answered from a local copy of the Discogs catalog, not the model’s memory.RennQuery PorscheDecades of generations and engine codes where a confident-but-wrong answer is worse than no answer. Grounded in a local Wikipedia mirror.WristQuery watchesReference numbers, caliber specs, and a market where “current value” genuinely needs a live web search — but most questions don’t.Celiac vs Me celiac diseaseMedical accuracy plus lived experience. The model leans on public-domain federal guidance for the facts and my book for the human side. The differences are all in the knowledge layer. Everything around it — the web app, the model routing, the guardrails, the cost controls — is shared.

One framework, four facades

Every site is the same stack:

LayerWhyClient — React 19 + Vite (SPA)A real-time chat UI with streaming, citations, and feedback isn’t a static page. Unlike this blog (zero JS), the chat apps earn their JavaScript.Server — a serverless backend that can streamOne command to deploy, runs close to users, scales to zero between requests, push-to-main auto-deploys.Session — a per-session stateful objectOne addressable actor per browser session with its own embedded SQLite: chat history, the rate-limit log, the list of domains a crawler has blocked.Analytics + knowledge — SQLite, with FTS5 full-text searchAnonymous query/usage logs, the pre-computed answer cache, and the domain knowledge tables (the Wikipedia mirror, the discography, the book).Chat plumbing — a WebSocket router + the Vercel AI SDKA thin router maps each session’s WebSocket to its stateful object; the AI SDK’s streamText handles the model call, tool loop, and streaming. The request flow for a single question:

Browser ──WebSocket──▶ Server ──▶ Session object (one per browser) ├─ 1. cached answer? (zero-cost fast path) ├─ 2. rate-limited? (30/hour per session) ├─ 3. Haiku gate (on-topic? needs the web?) └─ 4. Sonnet + tools (search_wiki / discogs / web) Browser ◀──streamed text + citations──┘ The heart of it is one streamText call. Simplified, but this is really the shape:

const result = streamText({ model: anthropic(MAIN_MODEL), // claude-sonnet-4-6 messages: [CACHED_SYSTEM, ...history], tools: { search_wiki: searchWikiTool(db), // free, always on // attach web_search ONLY when the gate flagged the turn FRESH: ...(gate.needsSearch ? { web_search: anthropic.tools.webSearch_20260209({ maxUses: 1, allowedDomains, }) } : {}), }, stopWhen: stepCountIs(6), // bound the tool/reasoning loop onFinish: logTurnUsageAndCost, }); Responses stream back token-by-token over the socket as Server-Sent-Event-style chunks, with source citations emitted as their own source-url parts so the UI can render them as little pills under each answer. The client paces the reveal at a steady ~220 characters/second so it reads like typing instead of bursting in jerky clumps.

A nice side effect of the shared design: when I learn something on one site, all four get it. The “never narrate a lookup” rule I’ll describe below started as a JazzQuery bug fix and is now in every system prompt.

Two models: a cheap bouncer...

Building domain-specific AI chatbots with Claude

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars