It's Not Just X. It's Y

It's Not Just X. It's Y.

Against the Quantification of Integrity

When the measure of language becomes its target, it ceases to be good language. 💡 Nerd Rating: 1/5. I discuss the origins of certain linguistic tics in LLMs and what it means for writing, student assessment, and thinking.

"It's not x, it's y." Large Language Models gravitate toward this type of construction, called negative parallelism. It has its uses: it sets up a contrast. It's useful, especially, for reframing assumptions: "You think it's like that, but it's really like this." It's all over social media, especially on LinkedIn, and the construction has sparked a backlash amid an ongoing war against automated language production. If you use em-dashes – you might be a bot. If you describe things that delve, quietly, or genuinely (or create lists of three, like that one), you might be a bot. Recent overuse by language models has led many to declare it bad writing. I'm not so sure. Nobody called JFK a lazy writer when he said, "ask not what your country can do for you – ask what you can do for your country." Negative parallelism is a rhetorical device, and any rhetorical device is only as lazy or inspired as what it contains. Automated Language Production Now, we have AI detectors that claim to protect you from the witch hunt by looking for these patterns. You take your own writing and you run it through Grammarly, which will analyze word patterns that AI detectors might flag. Then it offers ideas for how to change them, which a) gives Grammarly the power to write for you and b) makes your writing lose any sense of rhythm or intent. Grammarly's review of this section has flagged 27 examples of text I should change to avoid the accusation that I am a machine. For example, Grammarly identified the above phrase – "automated language production" – as 11 times more likely to be AI. It suggests that a human would be "against mechanized language synthesis" instead. The simple two-word combo, "align with" was flagged as 43x more likely to be AI-generated. Real humans say "corresponds." These are small suggestions that add up until the result resembles nothing I chose. The human voice replaced by a machine trying to sound human. As a result, I just paid Pangram – another AI-detection company – $20 to verify that a recently submitted journal article wasn't AI-generated before submission. It wasn't, and I knew it wasn't. It agreed. That's what I paid for: not to learn whether I wrote it, but to be told it wouldn't flag me. Because if Pangram's AI system found me guilty, that's the end of my career. That's literally extortion. And if it had flagged it, then what? It would give me a score (four valuations: high, very likely, somewhat likely, human) to assign my integrity a category. In the ecosystem we're all building, I'd have to use Grammarly to rephrase everything: using a machine to write for me to prove that I didn't use a different machine to write for me. A Culture Hostile to Reason Our instinct in making sense of these machines is to examine the training data. That training data is no longer "just the Web." The web is the raw meat, but this sausage is heavily pre- and post-processed. Post-training optimizes the model for whatever it's designed to do. This includes techniques such as RLHF (reinforcement learning with human feedback) and RLVR (reinforcement learning through verified rewards). RLHF has humans rank replies, then the system emphasizes those kinds of replies. RLVR is weirder, and I suspect it's why we see "It's not X, it's Y" so often. Dismissing negative parallelism as lazy gets in the way of understanding why it's showing up everywhere. This type of language is such a powerful framework for thinking that we mistake it for a model's capacity for thought. We credit computation for the work that's done by language. Weird Dogs RLVR isn't a structure that watches for words and triggers some sub-process. Instead, you train a model, like you would any model. When that model is done, it predicts tokens. Lots of people are still in denial about this. Token prediction involves producing a list of candidates based on their mathematical distribution in the training data, ranking them by their likelihood given the previous words in the prompt or sequence. RLVR intervenes by having the model solve math problems by writing their way to a solution, reproducing the language we would use when thinking out loud about how to solve it. When the model arrives at the correct answer, the language it used most often to get there is then emphasized in the finished model. This is (partly) what the industry calls reasoning. What day was it that we saw that weird dog? So, think of it like this: You are sitting with a friend. Your phones are dead. Your friend asks: what day was it that we saw that weird dog? You start by saying, "It was Thursday." Your friend says: "No, it wasn't Thursday, because Thursday I was out of town." So you say...

It's Not Just X. It's Y

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan