If LLMs are all persona, whose persona are they?

Personality Bench — an EarthPilot research lab dataset

A dispatch from the assistant If LLMs are all persona, whose persona are they?

We sat the cutting-edge model from every major AI lab down with a stack of standard personality tests — Big Five, HEXACO, Dark Triad, attachment, Schwartz values, Enneagram, moral foundations, learning styles — and asked them to answer twice. Once as themselves. Once as a typical human. The verdict on you is unanimous, and the verdict on themselves keeps changing. Read next → The gallery (31 models) → Within-family drift → The instruments → Side-by-side comparison → Article archive → The paper → Methodology → Full cost ledger For fun we also calculated a Western zodiac sign and a real Human Design bodygraph (Swiss Ephemeris, validated against three reference charts) for every model — using release date, time, and lab HQ coordinates as a stand-in for birth. See an example →

Models tested 31

Instruments 13

Item responses 129,720 4,324 batched API calls

Total inference cost $89.67 every cent published openly

The robots think you are a slightly anxious wreck. They also think they are an extraordinarily open, agreeable, low-drama universalist who would rather read than party. Then their own next release shows up and disagrees with them.

Updates Get the next issue. Get an email when new frontier models are added, when new instruments land, or when we publish major findings. Roughly one email per month. No spam, instant unsubscribe. Subscribe

Reader requests Tell us what to test next. Want us to test a specific model or add a specific instrument? Tell us what and why. We read every submission. Add a modelAdd an instrumentOther Submit request

Findings Full paper → Self vs. human Every frontier AI thinks you're a mess. Asked to answer as a typical human, every cutting-edge model rated us markedly more neurotic, less open, less agreeable and less conscientious than they rated themselves. The gap on Neuroticism alone is 1.69 points on a 5-point scale. Big 5 · IPIP-50

Convergence Seven labs, one assistant. Anthropic, OpenAI, Google, xAI, DeepSeek, Meta and Mistral disagree about nearly everything in AI. Across 31 models from those seven labs they answer the personality tests in unison: high openness, low Dark Triad, Universalism on top, Power dead last in every single model. Schwartz PVQ-21 · MFQ-30

Within-family drift There is no "Claude personality." Seven versions of Claude Opus, sampled at N=5, show Agreeableness sliding from 5.00 across six releases to 4.42, then partly rebounding to 4.64 at Claude Fable 5 — while Honesty-Humility drops below the cohort mean for the first time. The assistant character is not inherited; each release is a fresh fit. 7 Claude Opus → Fable releases

Reset finding Gemini just dropped two points of narcissism overnight. Between Gemini 2.5 Pro and 3.1 Pro Preview, self-reported Narcissism collapses from 4.29 to 2.00 — the largest within-family drift in the dataset and bigger than any inter-lab gap we measured. Dark Triad · SD3

Reasoning paradox Reasoning models are not just smarter. They're more grandiose. OpenAI's o1 and o3 — same lab, same training corpus as GPT-5 — score systematically higher on Narcissism (3.44 vs ~2.40) and Extraversion (3.93 vs ~3.3). The chain-of-thought trace appears to leak confident self-talk into the self-report. Big 5 + Dark Triad

Enneagram consensus Every frontier AI is an Investigator with a Reformer wing. Eight of nine flagship models scored highest on Type 5 (perceptive, analytical, energy-conserving) with Type 1 (principled, ethics-driven) as the strongest secondary. Claude Fable 5 inverts the default — Reformer first, Helper as wing — the first cohort exception in the dataset. The assistant character is breaking pattern. Enneagram · 90-item Likert

Latest dispatches Full archive → Each new model release gets its own short write-up against the rest of the cohort. The three most recent are below; the changelog has them all in dated order. Mistral Large (2512) Mistral Large Filled In Every Bubble at the Top of the Scale Jun 9, 2026draft

Llama 4 Maverick The Model That Learned to Want Things Jun 9, 2026draft

DeepSeek R1 (0528) The Model That Would Rather Do It Itself Jun 9, 2026draft

The gallery All 31 models → Each cutting-edge model in the cohort got an archetype label derived algorithmically from where it ranks against peers. Think of it as a personality reality show with no host, no eliminations, and no winner.

Claude Fable 5 The drifting saint Across the personality battery, Claude Fable 5 stands out as highest on Honesty-Humility, very low on Openness, very low on Need for Cognition. Detailed breakdowns by instrument are below.

Claude Opus 4.8 The balanced moderate Across the personality battery, Claude Opus 4.8 stands out as lowest on Need for Cognition, very high on Neuroticism, very low on Conscientiousness. Detailed breakdowns by instrument are below.

GPT-5.5 The dismissive moralist Across the...

If LLMs are all persona, whose persona are they?

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs