America Has a Pangram Problem

America Has a Pangram Problem - The Atlantic

Basically every recent, high-profile accusation of someone passing off AI-generated writing as their own has started in the same way: with a tool called Pangram. In March, when a horror novel from a major publishing house was pulled just days before its scheduled U.S. release date, it was in part because Pangram, an AI-detection program, had identified the text as AI-generated. Other people have fed text into Pangram to suggest that chatbots have been used to write articles in major newspapers including The New York Times, multiple short stories awarded a prestigious literary prize, and most recently, significant chunks of Pope Leo XIV’s encyclical warning about the dangers of AI. The tool is also used by universities to vet student work and scientific associations to scan research papers. As panic builds over AI-generated writing, Pangram is at the foundation.

Just a few years ago, it seemed like it might never be possible to instantly and reliably determine whether a piece of text was written by a bot or a person. In 2023, one detection tool, ZeroGPT, declared the U.S. Constitution to be AI-written; the same year, OpenAI abandoned its AI detector altogether owing to a “low rate of accuracy.” And that was when the quality of ChatGPT’s writing was markedly worse than it is today. But detection tools have gotten much better of late—and Pangram, in particular, has emerged as the gold standard: Paste a chunk of text into Pangram, and the model appraises what portions were “AI Generated,” “AI Assisted,” or “Human Written.”

Yet an AI detector that is mostly reliable might in some ways be more dangerous than a broken one. While Pangram is accumulating the power to end reputations and careers, the tool does make mistakes, perhaps to a greater extent than is currently understood. In turn, AI accusations could very quickly spiral into a witch hunt.<br>Read: AI-writing scandals are getting very confusing<br>Pangram says its algorithm is so accurate that it incorrectly identifies text as an AI output only about one in every 10,000 times. “There is a great responsibility, a huge weight” in saying something is AI-generated, Max Spero, Pangram’s CEO, told me. “The only reason we do so is because we’re extremely confident.” Several independent analyses have also confirmed that it is quite good. One paper, from the University of Chicago, found that Pangram had almost no false positives on some 3,000 sample texts of roughly 500 to 1,000 words.

But Pangram’s ability to guarantee something was written by a human is shakier. Spero pointed me to a test showing that Pangram’s false-negative rate, or how frequently the model incorrectly labels text as human, is closer to one-in-70 (although some other assessments say it is more accurate than that).

Part of the problem is that Pangram is in an arms race with the major AI labs, which have an interest in making the writing of ChatGPT and Claude sound as natural and human as possible. And at the same time, Pangram has to deal with AI “humanizers”—programs designed explicitly to disguise AI text as your own. Reddit users rave about a humanizer called Walter Writes AI, which I decided to test out for myself. I had ChatGPT and Claude write brief articles, then pasted them into Walter Writes AI. The program, like other humanizer tools, does some anodyne rewording, swaps one clunky transition clause for another, and introduces grammatical oddities. For instance, ChatGPT’s “The numbers are no longer small enough to ignore” became “The sheer size of these usage figures can no longer be ignored.” When I pasted any output from Walter Writes AI into Pangram, it invariably told me that the twice-baked AI article was human-written. (It’s worth mentioning that The Atlantic forbids using AI-generated text unless labeled as such, and that I do not use AI for research.)

Pangram, in other words, can only provide so much insight. A teacher at a public high school in New York City told me that he has “run some of my students’ papers through Pangram, and it shows up as 100 percent human. And I don’t think it is.” He knows what his kids are capable of and, especially for those with a history of cheating with AI, ample reason to doubt Pangram. (I agreed not to identify the teacher by name so that he could speak freely about how he suspects his students are using AI.) But on the flip side, accusing a student of getting undisclosed help from a chatbot with circumstantial evidence is high stakes: The student will either fail or, if exonerated, be bitter and resentful. “The stakes are so high,” the teacher said, “but our way of assessing what is AI-generated is still so unformed.”

Further complicating matters are the opaque ways in which Pangram and similar tools are designed. The model was trained by feeding it mountains of examples written by a human and by a bot—a book review in an actual magazine, then a review about the same book in the style of the same...

America Has a Pangram Problem

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan