Stochastic Parrots: Frequently Unasked Questions

Stochastic Parrots 🦜: Frequently Unasked Questions | by Emily M. Bender | May, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Stochastic Parrots 🦜: Frequently Unasked Questions

Emily M. Bender

10 min read· 2 days ago

Listen

It’s been a bit over five years since the Stochastic Parrots paper (Bender, Gebru et al 2021) was published (and somewhat longer since Google made it an enormous news story by firing my co-authors). During that time, I have been watching the phrase stochastic parrot(s) on social media, initially out of linguistic interest (it’s rare to get to see how a coinage develops from its very beginning). In the early days, most usage I saw was from people referring to the paper, and then people who had read the paper referring to large language models as stochastic parrots. Eventually, though, the phrase outran the paper, as people picked it up as a way to refer to LLMs. Tracking this phrase also provides a window into parts of the online discourse about “AI” that I would otherwise be unlikely to see. In that discourse, I see a lot of misconceptions about a) how large language models work and b) my own work on this topic. Accordingly, it seems like a fitting time to do some debunking, answering questions that people frequently fail to ask. Below what you’ll find aren’t questions, but the various statements that people make, when perhaps they should have stopped and asked a question. To keep this grounded in the actual text in question, here is where we introduce the term in the original paper: Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do [89, 140]. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. (p.616–617)

The phrase stochastic parrots was one attempt (among several) to make vivid what it is that large language models, when used to synthesize text, are doing. In later work, (Mystery AI Hype Theater 3000, The AI Con), I’ve also added synthetic text extruding machine as a way to describe systems that closely model which bits of words tend to co-occur in their input data and can be used to, well, extrude synthetic text. Bender says “AI is a stochastic parrot” I have never and will never say that “AI” is a stochastic parrot, because I reject “AI” as a way to describe technologies (LLMs or otherwise). Also, the Stochastic Parrots paper, written in Sept-Oct 2020, was not a paper about “AI” at all, but a paper about the risks and harms associated with the drive for ever larger language models, which, at that point, mostly weren’t being used to extrude synthetic text. (OpenAI had made GPT-2 and GPT-3 available for playing with, but this was still two years before they imposed ChatGPT on the world and synthetic text suddently became everyone’s problem.) The term “AI” appears only once, near the end of the paper, where we write: Work on synthetic human behavior is a bright line in ethical AI development, where downstream effects need to be understood and modeled in order to block foreseeable harm to society and different social groups. (p.619)

I believe this particular insight, and its phrasing, is due to Margaret Mitchell (aka Shmargaret). In the years since, this observation has unfortunately been repeatedly reinforced: work on synthetic human behavior unfortunately continued apace, and the foreseeable harms (predictably) came to pass. Bender says [some model] is “just” a stochastic parrot Indulge me into a little digression into linguistics here. The word just is the kind of word that evokes a scale or ranking. For example, She is just 5 feet tall places her on a scale of height and furthermore suggests that her height is further down that scale than would be expected or desirable or just normal/normative. So someone who says that I say that some model is “just” a stochastic parrot is also attributing a scale, perhaps of functionality (or, in the...

Stochastic Parrots: Frequently Unasked Questions

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast