The dangerous unknowns at the heart of LLMs

The Yale Review | Melanie Mitchell: What We Get Wrong About AI

The Yale Review

Subscribe<br>Donate

Folio

Jagged Intelligence

The dangerous unknowns at the heart of LLMs

Melanie Mitchell

IN 2023, A FEW MONTHS after OpenAI released the AI chatbot ChatGPT, Terrence Sejnowski, a neuroscientist and pioneer in the field of neural networks, wrote:

Something is beginning to happen that was not expected even a few years ago. A threshold was reached, as if a space alien suddenly appeared that could communicate with us in an eerily human way. . . . Some aspects of their behavior appear to be intelligent, but if it’s not human intelligence, what is the nature of their intelligence?

Sejnowski’s astonishment at ChatGPT’s language abilities was shared by longtime AI researchers and ordinary people alike. The chatbot could generate fluent natural language. It could answer questions; write essays, poems, and rap lyrics; compose text in the style of famous authors; do students’ homework; and generate convincing peer reviews of scientific papers.

ChatGPT was the first large language model (LLM) chatbot to be easily accessible to the general public, and other companies soon followed with competitors, such as Google’s Gemini, Anthropic’s Claude, Meta’s Llama, and Microsoft’s Copilot. Improved versions have been released every few months. While early versions of LLMs—the systems underlying today’s chatbots—were dismissed as “stochastic parrots” and “autocomplete on steroids,” current ones often give the appearance of understanding language, and the physical and social worlds described by language, in a deep, humanlike way.

AI has become increasingly skillful. It can engage in conversations on seemingly any topic, write complex pieces of code, as well as generate extraordinarily realistic images and videos, prizewinning art, and chart-topping songs. Large AI systems have recently earned gold medals at the International Mathematical Olympiad, helped humans solve long-standing problems in mathematics and biology, and contributed to major improvements in weather prediction and drug design, among other achievements. In 2024, in recognition of this astounding progress, AI researchers were awarded Nobel Prizes in both Physics and Chemistry. An avalanche of tech-company blog posts, breathless media, and assertions from AI experts has communicated to the public an overriding narrative: after decades of unfulfilled promises, true artificial intelligence has finally arrived, and it will change everything about our lives.

THAT LLMS APPEAR to understand language, though, does not mean they actually understand it as humans do. Indeed, while AI boosters have touted the superhuman capabilities of LLMs and their astounding successes, other AI users have noticed, and reported on, their puzzling, unhumanlike failures, which have not gone away as these systems have progressed. How can a system that has exceeded human performance on advanced math problems sometimes fail at simple elementary-school-level problems? Why do these systems answer a question perfectly when it is worded one way but struggle when it is worded in a different but (to a human) equivalent way? How can a system that generates accurate and incisive summaries of books also produce similarly confident and authoritative-sounding summaries of nonexistent titles? How can a system that has been extensively trained to refuse dangerous requests be easily fooled by “prompt engineering” into cheerfully providing the prohibited information?

For humans, one kind of skill can often predict abilities at similar skills; this is not the case in the jagged landscape of AI.

In general, today’s AI systems perform extremely well until, often unexpectedly, they don’t. They are inconsistent, lack a sense of when they should be confident or uncertain about their answer, are susceptible to manipulative prompting, and struggle with tasks that differ sufficiently from their training data. For example, one study performed by AI researchers at Apple showed that simply adding irrelevant information to simple word problems in a widely used mathematics benchmark test caused several AI models to perform dramatically worse than they did when given the problems without the extraneous information. Here’s one example (with irrelevant added information in italics): “Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but 5 of them were a bit smaller than average. How many kiwis does Oliver have?” In 2025, researchers from Apple found that all of the models they tested performed substantially worse on such variations than on the original problems.

A new term has been coined to describe AI in its current form: “jagged intelligence.” The term captures the fact that the landscape of AI capabilities is profoundly uneven: the tools demonstrate excellent abilities on certain problems but surprising failures on other similar...

The dangerous unknowns at the heart of LLMs

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs