OpenAI admits AI hallucinations are mathematically inevitable (Sept. 2025)

hansmayer2 pts0 comments

OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws – Computerworld

Editions

Search

Menu

Topics

Close

Analytics<br>Android<br>Apple<br>Artificial Intelligence<br>Augmented Reality<br>Careers<br>Cloud Computing<br>Collaboration Software<br>Computers and Peripherals<br>Data Center<br>Emerging Technology<br>Enterprise Applications<br>Enterprise Buyer’s Guides<br>Generative AI<br>Hybrid and Remote Work<br>Industry<br>IT Leadership<br>IT Management<br>IT Operations<br>Mobile<br>Networking<br>Office Suites<br>Operating Systems<br>Productivity Software<br>Security<br>Vendors and Providers<br>Windows

AmericasUnited States

AsiaIndia<br>Korea (대한민국)

EuropeGermany (Deutschland)<br>Netherlands<br>Poland (Polska)<br>Spain (España)<br>Sweden (Sverige)<br>United Kingdom

OceaniaAustralia<br>New Zealand

by Gyana Swain

OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws

news

Sep 18, 20256 mins

In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits.

Credit: mongmong_Studio- shutterstock.com

OpenAI, the creator of ChatGPT, acknowledged in its own research that large language models will always produce hallucinations due to fundamental mathematical constraints that cannot be solved through better engineering, marking a significant admission from one of the AI industry&rsquo;s leading companies.

The study, published on September 4 and led by OpenAI researchers Adam Tauman Kalai, Edwin Zhang, and Ofir Nachum alongside Georgia Tech&rsquo;s Santosh S. Vempala, provided a comprehensive mathematical framework explaining why AI systems must generate plausible but false information even when trained on perfect data.

[ Related : More OpenAI news and insights ]

&ldquo;Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty,&rdquo; the researchers wrote in the paper. &ldquo;Such &lsquo;hallucinations&rsquo; persist even in state-of-the-art systems and undermine trust.&rdquo;

The admission carried particular weight given OpenAI&rsquo;s position as the creator of ChatGPT, which sparked the current AI boom and convinced millions of users and enterprises to adopt generative AI technology. (See also: OpenAI, Microsoft discuss shape of future relationship.)

OpenAI&rsquo;s own models failed basic tests

The researchers demonstrated that hallucinations stemmed from statistical properties of language model training rather than implementation flaws. The study established that &ldquo;the generative error rate is at least twice the IIV misclassification rate,&rdquo; where IIV referred to &ldquo;Is-It-Valid&rdquo; and demonstrated mathematical lower bounds that prove AI systems will always make a certain percentage of mistakes, no matter how much the technology improves.

The researchers demonstrated their findings using state-of-the-art models, including those from OpenAI&rsquo;s competitors. When asked &ldquo;How many Ds are in DEEPSEEK?&rdquo; the DeepSeek-V3 model with 600 billion parameters &ldquo;returned &lsquo;2&rsquo; or &lsquo;3&rsquo; in ten independent trials&rdquo; while Meta AI and Claude 3.7 Sonnet performed similarly, &ldquo;including answers as large as &lsquo;6&rsquo; and &lsquo;7.&rsquo;&rdquo;

OpenAI also acknowledged the persistence of the problem in its own systems. The company stated in the paper that &ldquo;ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations, especially when reasoning, but they still occur. Hallucinations remain a fundamental challenge for all large language models.&rdquo;

OpenAI&rsquo;s own advanced reasoning models actually hallucinated more frequently than simpler systems. The company&rsquo;s o1 reasoning model &ldquo;hallucinated 16 percent of the time&rdquo; when summarizing public information, while newer models o3 and o4-mini &ldquo;hallucinated 33 percent and 48 percent of the time, respectively.&rdquo;

&ldquo;Unlike human intelligence, it lacks the humility to acknowledge uncertainty,&rdquo; said Neil Shah, VP for research and partner at Counterpoint Technologies. &ldquo;When unsure, it doesn&rsquo;t defer to deeper research or human oversight; instead, it often presents estimates as facts.&rdquo;

The OpenAI research identified three mathematical factors that made hallucinations inevitable: epistemic uncertainty when information appeared rarely in training data, model limitations where tasks exceeded current architectures&rsquo; representational capacity, and computational intractability where even superintelligent systems could not solve cryptographically hard problems.

Industry evaluation methods made the problem worse

Beyond proving hallucinations were inevitable, the OpenAI research revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA,...

openai rsquo ldquo rdquo hallucinations models

Related Articles