Generation is cheap, the decisions are the artifact

Generation is cheap. The decisions are the artifact. NEW·noemica is now an MCP server . Let your coding agent run user feedback before you ship.Set it up → field study · the loop on itself · part 2 Generation is cheap. The decisions are the artifact. Where formal testing methods stop and user feedback begins. Formal tests prove the system did what the spec said. User feedback proves someone could figure out what to do. They aren’t the same thing, and the gap between them is wider than it sounds. Two requests, in order. Notice how different they feel to do. Write a test for your product’s main flow. Write a test that would have caught the bug your users hit yesterday. The first is a walk in the park. Pick a level, unit, integration, end-to-end, whichever feels closest to the bug class. Open the page, click the buttons in order, assert the states. CI runs green. Coverage ticks up. Your coding agent does this out of the box, no questions asked. The second is, by definition, hard. To write that test you would have had to already know: what motivation a real user brought to the page, what they read into your copy, where they hesitated, which moment of dissonance made them give up. None of that lives in the code. None of it can be inferred from the spec. You either: - You knew it ahead of time, in which case the bug wouldn’t have shipped. - You didn’t, in which case there’s no test to write. That gap doesn’t close by going up a level of formality: Types check that the values you pass match the shapes you declared. Lints check that the code follows the rules you wrote down. Unit tests check that functions return what you asserted. Most LLMs’ world model already covers the above layers on the first generation.Integration tests check that components compose. End-to-end tests check that the whole flow does what the spec says it should. With the correct configuration your coding agent will be able to autonomously recover from any backpressure that arises from any of the above layers. Though, from your perspective, it will work on first iteration against the coding agent.All of these are conformance checks. Every one of them is asking the same question, at different scales: did the system do what you told it to do? User feedback is asking a different question entirely: could an inbound user, under their own motivation, with their own prior experience, and their own expectations, figure out what they were supposed to do? That question doesn’t reduce to spec conformance. The misread was already embedded in the spec. This isn’t hypothetical. While the experiment in part 1 was running, an iteration shipped a version of the running page whose dominant copy was literally “Close this tab.” A participant read it as an instruction. She did. The page rendered exactly to its spec. No formal test would have flagged the copy, the copy was the spec. What surfaced the bug was a person trying to use the product under their own motivation, deciding the page meant what it said, and acting on it. That’s the entire distinction in one frame.

“A user study moves the cost. You pay in dollars and time instead of trust.” The user can’t be in the spec. The spec couldn’t have caught the “Close this tab” bug. No spec could have. To write one that would, you’d need the user inside the loop, reading every copy change and telling you what they think it means before you shipped it. But they’re not inside the loop. They can’t be. They’re the people paying you to do the building. That’s the unstated tension at the center of every product. Your users sit at every position in the same loop: they pay the bills so they are the ones you’re trying to satisfy they are also the ones actually using the product so they are the only ones who can tell you how to build it Same group of people, four roles. They arrive with a finite reservoir of trust, and every shipped bug, every misread copy line, every click against expectation draws a small amount out of it. You’d love them present for the feedback. You can’t keep spending their trust capital teaching yourself how to build for them. The contradiction has no clean resolution. The only tool we have that partly resolves it is the user study, which is imperfect by definition: a synthetic motivation, a fabricated context, a finite-time approximation of the real thing. You iterate, you ship, and you spend trust regardless. A user study moves the cost. You pay in dollars and time instead of trust. The mistakes still happen. They just don’t happen in front of the people you’re billing.

“There’s no formula for what a motivated person will do with your product. The only way to find out is to let one walk through it.” The asymmetry, at every level. Validating any formal test is cheap. You write the assertion, run it, see green or red, move on. Type, lint, unit, integration, e2e, doesn’t matter. Whatever the test was supposed to prove, the answer arrives in seconds. Generating a useful test is asymmetrically expensive, and...

Generation is cheap, the decisions are the artifact

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy