Why most AI evals would miss the Linear sales email failure

jflynt762 pts0 comments

Why most AI evals would miss the Linear sales email failure | Tenure Platform<br>Project Resume Memory Modes PR Review<br>Editor Integrations<br>VS Code VSCodium<br>Compatible Clients<br>Mobile Open WebUI OpenClaw<br>Teams<br>Teams Overview Shared Memory AI Governance EU AI Act Compliance<br>How it works<br>Belief Merging Contradiction Handling Memory Modes AI Governance<br>Resources<br>Docs Writing Benchmark Paper GitHub

Install Free

Writing › Agent evaluation<br>Research Why most AI evals would miss the Linear sales email failure

Linear's sales agent emailed an existing customer six times with the wrong company name. It is easy to call that bad AI outreach. But the email was only the visible part. The real failure happened earlier, when the system decided it was allowed to send without proving the facts that decision depended on.

Tenure research · Jun 22, 2026 · ~7 min read

JL

Jean-Michel Lemieux @jmwind · 10:43 AM · 6/22/26<br>Hey @karrisaarinen, friendly heads-up from the field. I've received 6+ emails from someone on your sales team with comical AI-slop. Wrong company name & already a customer, etc... Always love a good laugh, but you may want to skip-level this one?<br>@linear.app)<br>Re: Linear at Quantum Innovations<br>Stop the AI slop pls. You got the company wrong, you didn't look at my email domain, and we already use Linear. Thinking of canceling now.

Jean-Michel Lemieux<br>Developer · spellbook.com

5 4 373 94K views

KS

Karri Saarinen @karrisaarinen · 2h<br>Thanks and apologies. Not ideal, will check with the team what caused this.Agree that emailing existing customers and 6 times is the dumbest thing

TL;DR<br>Most people describe bad AI outreach as a generation problem. The message was awkward, repetitive, or poorly personalized.<br>But the larger failure usually happens one step earlier. Before anything gets written, the system has to know who the recipient is, which company they belong to, whether they are already a customer, whether the account allows outreach, and whether this person has already been contacted too many times.<br>If those checks are wrong or missing, a better model does not solve the problem. It just writes a cleaner version of the wrong action.<br>That is why agent evaluation has to look upstream. It should ask what the system checked before acting, not only whether the final output reads well.<br>GroundEval is built around that question: what did the agent search, fetch, cite, and have permission to use before it answered or acted?

The wrong lesson The email was not the first failure

The easy reaction to the Linear email is to laugh at the output. The company name<br>is wrong. The recipient is already a customer. The same sequence had already hit<br>them multiple times. Then the CEO replies publicly, and the whole thing becomes<br>another example of AI slop.

But that framing lets the system off too easily.

The embarrassing part is the email everyone saw. The more important part happened<br>before the email existed. Somewhere upstream, the system had enough wrong or<br>unchecked state to decide that this person should be contacted at all.

That is the part a better subject line would not fix. A warmer tone would not fix<br>it. Even a model that writes beautiful outbound copy would still have sent the<br>wrong message if it never checked the basic facts first.

In the Linear case, the pre-send checks were the whole story. Does the company name<br>match the recipient's domain? Is this contact already a customer? Has this sequence<br>already run too many times? If those answers are wrong or never checked, generation<br>is already starting from a failed state.

The visible failure was a bad email. The earlier failure was simpler: the system<br>did not prove that the email should be sent.

The dependency list Before the message exists, the system has facts to prove

Outbound email looks simple from the outside. Pick a contact, write a message, send it.<br>Inside a real company, the action depends on a stack of state that has to be true.

Recipient state<br>Is this person a prospect, an active customer, a former customer, a partner, an employee, or someone who should never receive this sequence?

II<br>Company mapping<br>Does the company name in the email match the account linked to the recipient, the email domain, and the current CRM record?

III<br>Account status<br>Does the account already use the product, have an open opportunity, have an assigned owner, or sit under a suppression rule?

IV<br>Outreach history<br>How many times has this person been contacted, through which channel, by which team, and with what response?

Permission to act<br>Given all of that state, is this automation allowed to send, or should it suppress, route to a human, or do nothing?

If any one of those checks fails, the right behavior is not "write a better email." The<br>right behavior is "do not send." That is why calling this a content quality problem misses<br>the failure mode. The generated text is only the artifact left at the scene.

What most evals see Evaluating the email is too late

A conventional evaluation...

email wrong already linear failure company

Related Articles