Language models are weird for the same reason human cultures are weird

David Oks

SubscribeSign in

Language models are weird for the same reason human cultures are weird You can’t have adaptive learning without strange tics

David Oks May 06, 2026

174

12 27

Cultures of the world, photographed by Irving Penn In November 2025, shortly after OpenAI released GPT-5.1—a new model that promised “a smarter, more conversational ChatGPT”—a small set of users started to notice something weird. GPT-5.1 was indeed smarter and more conversational; but it also had a strange habit of referring to things as “goblins.” For a time this was treated as a quirk—language models, after all, do all sorts of strange things—and nobody gave it much thought. But soon things started to get stranger. With each new model release in the months that followed—5.2 in December, 5.3 in February, 5.4 in March, 5.5 in April—OpenAI’s models became more and more insistent on talking about goblins. Soon the bestiary expanded to include not only goblins but also gremlins, trolls, ogres, raccoons, and pigeons; and by the early months of 2026, the goblin tic had become prominent enough to be disruptive. Contractual liabilities were “legal goblins”; the debugging process meant hunting “chaos goblins”; a point would be announced with “here’s the important goblin.” One programmer counted more than twenty unprompted goblin references in a single session. When asked to produce a unicorn, the model would draw a goblin; sometimes it would refer to itself as a “Goblin-Pilled Transformer.” So what had started as a curio became an annoyance. The goblin obsession had to be curbed. So by the time that OpenAI released GPT-5.5, it had added a system prompt to its Codex programming harness, instructing the model to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.” A few days later, someone noticed the line on GitHub; people started wondering why OpenAI’s models seemed so interested in goblins; and OpenAI decided to explain the whole affair with an interesting blog post called “Where the goblins came from.” So where did the goblins come from? If you were to answer in one word, you could do pretty well with “overfitting.” If you were allowed to answer in two words, you could perhaps do a better job with “chunky post-training.” (Don’t worry if either of those terms is unfamiliar; we’ll return to them.) The models were trained to do something; the feedback signal from that training process was coarse enough that the model struggled to distinguish which features of its output earned the reward and which ones were incidental; and the result was that along with the things it was trained to do, the model picked up weird quirks of its learning regimen. And then those quirks were magnified with each generation of model, since the outputs the models produced were used to train future generations of models. This is a feature of language models that is both weird and ubiquitous. Our AI systems are unbelievably capable learners that are also defined by weird fixations and tics from the process that created them. The models learn to fixate on particular words, like “delve” or “tapestry” or “testament”; they learn to lean on certain annoying grammatical constructions, like “it’s not X, it’s Y”; they learn to hedge even when hedging is unnecessary. And there are countless more subtle basins of weirdness. Using the word “elucidate” makes certain open models much more likely to output code in response, even if your question has nothing to do with code. (This might be because post-training datasets for programming tend to pair stiff, formal prose with code samples, and the model learns to treat the one as a cue for the other.) And this is true even for the world’s best models. Somewhere in its post-training process, Anthropic’s Claude Opus 4.5 learned that ambiguous phrasing signals a puzzle to be solved. So if you tell it that “I accidentally locked my son in his room and his friend is crying,” it will say that you’ve given it “an amusing little riddle” and tell you that “the answer is that your ‘stubborn boy’ is a donkey.” There are two ways to understand these quirks. The first is the natural engineering response: these are technical bugs, artifacts of imperfect training, and they reflect how different language models are from human brains. I don’t think that this view is necessarily wrong; there’s quite a lot to be said for it. But there’s another way of understanding these systems, which I find more promising if we want to understand why these systems are both so extraordinarily capable and yet so prone to these strange behaviors. That view is this. Language models are adaptive systems: they are systems that adjust their behavior in response to feedback from the outside world, and make that adjustment in a way that tends to improve future...

Language models are weird for the same reason human cultures are weird

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down