The Information Theory Behind Why AI Writing Sucks

mojoe1 pts0 comments

The Information Theory Behind Why AI Writing Sucks | Pangram Labs

NewInstantly know what's human and AI on Twitter, LinkedIn, Substack and more. Get our new chrome extension.

Learn More

PricingContact Sales

LoginTry it for free

Try it for free

div]:h-0">Solutions<br>AI Detector<br>Chrome Extension<br>API<br>LMS Integrations<br>Plagiarism Checker<br>Multilingual AI Detection

div]:h-0">Use Cases

div]:h-0">Company

div]:h-0">Blog

Pricing

Contact Sales

LoginTry it for free

Table of contents<br>Voice as a probability distribution<br>The RLHF trap and the "Annotator Consensus Dialect"<br>The illusion of camouflage (why prompting for style fails)<br>The failure of temperature and friends<br>So what?

Disclosure: An AI language model was used during the editing process to draft technical descriptions and suggest structural and prose improvements. Several suggestions from AI were used in the final version of the article.

I have read an embarrassingly large amount of fiction, especially science fiction. I also use every flagship AI model that is released for my software engineering job.

Those two sets of experiences left me with a gnawing feeling that AI has a shockingly uniform "voice" when compared to a high-functioning human author.

Anyone with a love for literature has felt what I'm talking about. I've read stories by about five thousand different authors, but I honestly think that even if you've only read a half-dozen authors you'll notice that each author occupies their own stylistic space.

Compared to the unique voices of human writers, AI writing sounds remarkably uniform. It turns out that there's a good reason for this, and it has to do with information theory.

Voice as a probability distribution

A unique authorial "voice" is not random, and it is not average. It is a specific probability distribution — let's call it P_author. When an author writes, they sample from a highly idiosyncratic process. They have specific conditional probabilities for how they implement concepts, pacing, vocabulary, and other stylistic tools.

What makes a voice recognizable are the low-frequency, high-impact choices that an author makes consistently (the long tail of the distribution). If I say "Ted Chiang", you'll immediately think about how syntactically plain but semantically dense his sentences are (it's a style I admire, but as this parenthetical demonstrates, I cannot emulate). If I say "Ursula K. Le Guin", you'll think about how she can be so clear and grounded but still give a lyrical feel — I can't really describe her style well, but readers of Le Guin know what I mean.

Ultimately what I'm getting at is that the right way to measure how "AI-like" a text sounds is not to check whether it's predictable in general — most competent writing is somewhat predictable — but to measure the KL divergence between the model's output distribution and a specific author's distribution: D_KL(P_author || Q_model). For those unfamiliar with KL divergence, this measures how badly the model's distribution fails to cover the author's choices (to be specific, it's measuring the expected extra information cost of encoding samples from P using a code optimized for Q). When this divergence is large and structured, you hear a voice.

The RLHF trap and the "Annotator Consensus Dialect"

During pre-training, a large language model generates a map of a generalized distribution of human text. This base distribution, Q_base, is enormously wide. In its latent space it contains the capacity to approximate almost any P_author.

The trap I mention begins with alignment. To make the model safe and useful, labs apply techniques like Reinforcement Learning from Human Feedback (RLHF) and others. The details vary, but the bottom line is that the model is optimized to produce outputs that score well against a reward signal derived from human (or AI) preferences.

This does not push the model toward the statistical average of English. It pushes it toward something with a different probability distribution — let's call this the Annotator Consensus Dialect.

The mechanism to get there is this: when the judges (gig workers hired to evaluate outputs or experts or whoever) evaluate outputs, idiosyncratic writing creates high variance in ratings. My style of writing might score 5/5 from one rater and 2/5 from another. But a sterile, symmetrical, heavily hedged response might score 4/5 across the board. The optimization algorithm dictates that the safest way to maximize expected reward is to collapse variance. It is the conversational equivalent of hotel lobby decor.

You might say "Joe, this isn't a fair characterization! Newer alignment techniques are explicitly designed to preserve diversity!". While this is true, the newer methods still optimize for a notion of "preferred" output, which still penalizes high-variance risk-taking relative to safe, broadly acceptable prose.

This is a testable claim (I haven't tested it, but it's testable). If you measured the KL divergence between...

distribution model from writing human voice

Related Articles