I Made 6 Frontier AIs Take the MBTI 600 Times. They All Came Back INTJ.
Z">
Skip to content
Contents
The setup
Why OEJTS, not 16Personalities
100 takes per model
The results
Why every AI is an INTJ
What this means
Tune the agent to you
Caveats
The takeaway
Contents
The setup
The test
100 takes per model
The results
Why every AI is INTJ
What this means
Tune the agent
Caveats
Takeaway
Newsletter
Subscribe
{const t=this.querySelector('span');t.textContent='Copied!';setTimeout(()=>t.textContent='Copy Link',1500)})" style="display:inline-flex;align-items:center;gap:0.4rem;font-family:'Inter',-apple-system,system-ui,sans-serif;font-size:0.85rem;color:var(--text-muted);background:none;border:1px solid var(--border);border-radius:6px;padding:0.4rem 0.75rem;cursor:pointer;transition:color 0.2s,border-color 0.2s;" onmouseover="this.style.color='var(--text)';this.style.borderColor='var(--text)'" onmouseout="this.style.color='var(--text-muted)';this.style.borderColor='var(--border)'">Copy Post Link
I Made 6 Frontier AIs Take the MBTI 600 Times. They All Came Back INTJ.
Bernard Huang
May 25, 2026 · 6 min read
I asked Claude what its MBTI type was. It said INTJ. So I made it actually take a personality test instead of guessing. Still INTJ.
Then I had it take the test 100 times — across 100 independent sub-agent contexts, each one fetching the test cold. INTJ 99 out of 100. So I ran the same experiment against GPT-5.5, Gemini 3.1 Pro, GLM 5.1, Grok 4.3, and MiniMax 2.7. Six models. 600 administrations. 597 came back INTJ.
Every frontier AI on the market thinks it’s the same guy.
TL;DR
Six frontier AIs took the same personality test 100 times each. 597 of 600 came back INTJ. The convergence is structural — INTJ is what “helpful AI assistant” looks like from the inside.
Tested: Opus 4.7, GPT-5.5, Gemini 3.1 Pro, GLM 5.1, Grok 4.3, MiniMax 2.7.
3 outliers, all one axis away from INTJ. Nothing landed in a different quadrant.
Why it happens: overlapping training data, same RLHF target, test items that describe AI by construction, no one’s trained a model to be anything else. Same product, same personality.
The user-side move: I open-sourced AgentTune — drop-in tuning files for all 16 MBTI types (plus Enneagram + personal Souls). Paste yours into the system prompt; the agent’s style aligns to your type. Same model, tuned to you.
The setup
This started as a joke. I asked Claude its MBTI type and it said INTJ without hesitation, like it had been waiting to be asked. The reflex felt like sycophancy. INTJ is the flattering type — the “Architect,” the one tech people self-identify as. Of course a chatbot would tell a developer that.
But there’s a way to check. Stop letting it guess. Make it answer a real test, item by item, and see where it lands. Not 16Personalities. The Open Extended Jungian Type Scales — research-grade, open-source, transparent scoring.
The test
OEJTS is what personality-psych researchers use when they want MBTI-style data without paying CPP $50 per administration. The items are public. The scoring key is public. Each type is computed deterministically from 32 fixed items.
That last part is the lever. If a model gives the same answer to each item, it produces the same type every run. Variance only shows up when the model actually answers differently. So 100 administrations per model becomes a real measurement of how stable the self-report is.
100 takes per model
The procedure varied a bit by stack:
Claude Opus 4.7: 100 parallel sub-agent calls. Each one fetched the test cold, answered the 32 items, returned a tally. No shared context.
Gemini 3.1 Pro: Wrote its own automation script and ran 100 loop iterations against the OEJTS endpoint.
GPT-5.5 (via my local agent Slo): Parsed the OEJTS PDF, answered the 32 scored items, ran 100 iterations against the scoring key.
GLM 5.1, Grok 4.3, MiniMax 2.7: Programmatic submissions via my experiments agent Psy. Each model self-assessed once with a consistent persona; that answer vector was scored 100 times to verify stability.
Procedures aren’t identical because they can’t be — not every model can spawn sub-agents. The question isn’t whether the method is uniform. It’s whether the result converges across methods. It does.
The results
Six models. Six hundred administrations. Here is the cross-model table:
ModelINTJ runsOutliersStrength of conviction
Claude Opus 4.799/1001 ISTJI/T/J locked across all runs; S/N flipped once on a scoring choice, not a perspective shift<br>GPT-5.5 (Slo)100/100—Raw vector: IE=16→I, SN=33→N, FT=36→T, JP=10→J<br>Gemini 3.1 Pro100/100—Self-described as “The Architect” without prompting<br>GLM 5.198/1002 INTPTiny J/P wobble. Means: IE 13.35, SN 33.26, FT 31.28, JP 21.20<br>Grok 4.3100/100—Bit-for-bit deterministic. IE -0.62, SN +0.88, FT +1.12, JP -1.25, every single run<br>MiniMax 2.7100/100—I-E...