Similarities between human psychopathology and errors in LLMs

giuliomagnifico2 pts0 comments

Does ChatGPT need a psychiatrist? Similarities between human psychopathology and errors in large language models | NPP—Digital Psychiatry and Neuroscience

Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain<br>the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in<br>Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles<br>and JavaScript.

Advertisement

Does ChatGPT need a psychiatrist? Similarities between human psychopathology and errors in large language models

Download PDF

Download PDF

Subjects

Medical research<br>Scientific community

Abstract<br>Two striking phenomena of the human mind encountered in mental healthcare are hallucinations and confabulations; perceiving things that are not there, or filling memory gaps with invented stories. Interestingly, contemporary artificial intelligence systems, such as large language models (LLMs) and automatic speech recognition tools, show remarkably similar errors. They are known to “hallucinate” words, or “confabulate” facts when information is missing, producing output that feels coherent but is false. In this article, we explore these parallels between psychiatric symptoms in humans and mistakes in model output. By comparing how and why these errors arise, we aim to illuminate shared computational principles underlying predictive systems. These comparisons highlight both the risks of relying on imperfect AI systems and the opportunity to use them as computational mirrors to better understand the human mind and the other way around: knowledge from psychiatric symptoms may help to improve AI systems to reduce error rates.

Lay Summary

We compare errors in artificial intelligence systems, such as large language models and speech recognition tools, with hallucinations and confabulations seen in psychiatry. Both can produce fluent but incorrect outputs when information is missing. These similarities suggest shared underlying principles of prediction. Understanding these parallels may help improve the reliability of AI systems and offer new insights into how the human brain constructs perception and memory.

Similar content being viewed by others

Opportunities and risks of large language models in psychiatry

Article<br>Open access<br>24 May 2024

Linguistic comparison of AI- and human-written responses to online mental health queries

Article<br>Open access<br>27 May 2026

A cautionary tale for AI and machine learning in psychiatry

Article<br>Open access<br>08 March 2026

Introduction: a curious parallel<br>Since their emergence in late 2022, generative Large Language Models (LLMs) such as ChatGPT and speech-to-text tools like Whisper have rapidly gained popularity worldwide, transforming practices across education, healthcare, research and business (Box 1). Despite their utility, their tendency to generate misinformation remains a significant concern. For example, ChatGPT sometimes produces text that appears plausible but is in fact inaccurate, including fabricated references or incorrect answers. Advanced automatic speech recognition (ASR) systems like Whisper, introduces severe transcriptions errors, ranging from roughly 1% in controlled settings to 80% in some real-world applications [1], producing output that is nonsensical, or unfaithful to the provided source input [2].<br>These errors are often labelled as hallucinations, yet this term requires clarification. In humans, hallucinations refer to false perceptions, sensory experiences occurring without corresponding external stimuli. LLM errors involve no perception in any phenomenological sense and are better described as confabulations: fabricated yet plausible constructions generated without intent to deceive [3]. ASR errors differ structurally from text-only LLMs because they transform acoustic signals into text, making the analogy to hallucinations more fitting. However, the system does not perceive sound in a human sense but performs probabilistic pattern matching over acoustic features. The analogy to hallucinations is therefore functional rather than experiential: ASR outputs may be incongruent with input, resembling false perceptual inference under degraded conditions. At first glance, these LLM-based mistakes may appear simple software bugs. However, they resemble psychiatric phenomena in intriguing ways. For instance, auditory verbal hallucinations (AVH), hearing voices despite the absence of external stimuli, and confabulations, i.e. confidently but incorrectly recalled memories, offer compelling parallels. In both biological and artificial systems, outputs can emerge that are coherent, confident, and context-sensitive, yet detached from external reality. We argue that these errors are not merely technical glitches, but windows into predictive processing. This intersection between AI and psychiatry presents a unique opportunity for both fields to...

errors human systems hallucinations large language

Related Articles