AIMAC Leaderboard | AIMAC, the AI Model Accessibility Checker
Skip to main content
AIMAC: The AI Model Accessibility Checker
AI is writing more code than ever. But is it accessible to People with Disabilities ?
We prompted the top AI models to build web pages across 28 categories and audited them for accessibility. We published every generated page, side by side, so you can see how different models tackled the same design challenge. We even measured emdash usage.
AIMAC is an initiative by the GAAD Foundation, in partnership with ServiceNow. Updated Jun 20, 2026.
Leaderboard
Legend:<br>🅿️ Pareto Optimal<br>🏆 Lowest Debt
AIMAC Leaderboard showing model rankings by accessibility debt, violations, and cost
Rank ▲
Model ▲▼
AIMAC Debt ▲▼
lower = better
Total Cost ▲▼
in USD
🅿️<br>🏆
OpenAI: GPT 5.4 Mini
0.00
$0.95
OpenAI: GPT 5.3 Codex
0.00
$3.02
OpenAI: GPT 5.5
3.32
$12.41
OpenAI: GPT 5.5 Pro
3.43
$128.11
🅿️
OpenAI: gpt oss 120b
3.85
$0.09
Qwen: Qwen3.5 397B A17B
4.09
$0.76
Z.ai: GLM 4.7 Flash
4.19
$0.10
Google: Gemini 3.1 Pro Preview
4.40
$4.16
MoonshotAI: Kimi K2.7 Code
4.48
$1.46
10
Qwen: Qwen3 Coder Next
4.54
$0.27
11
Anthropic: Claude Haiku 4.5
4.57
$2.30
12
Qwen: Qwen3.7 Plus
4.64
$0.61
13
Qwen: Qwen3.6 Flash
4.67
$0.58
14
Anthropic: Claude Opus 4.8
4.775
$9.33
15
NVIDIA: Nemotron 3 Super
4.778
$0.16
16
DeepSeek: DeepSeek V4 Flash
4.83
$0.10
17
Anthropic: Claude Fable 5
4.94
$22.33
18
Z.ai: GLM 5.2
4.95
$2.01
19
MiniMax: MiniMax M3
5.16
$0.73
20
DeepSeek: DeepSeek V4 Pro
5.43
$0.71
21
Qwen: Qwen3 Coder Plus
7.18
$0.62
22
Anthropic: Claude Opus 4.8 (Fast)
7.46
$19.52
23
Mistral: Mistral Medium 3.5
8.16
$1.65
24
Amazon: Nova 2 Lite
8.20
$0.43
25
Mistral: Mistral Large 3 2512
8.229
$0.43
26
Arcee AI: Trinity Large Thinking
8.230
$0.20
27
Mistral: Codestral 2508
8.66
$0.14
28
Qwen: Qwen3.7 Max
9.18
$1.79
29
Anthropic: Claude Sonnet 4.6
9.83
$12.64
30
Google: Gemma 4 31B
10.30
$0.17
31<br>🅿️
Google: Gemma 4 26B A4B
11.15
$0.06
32
Mistral: Mistral Small 4
12.92
$0.17
33
MoonshotAI: Kimi K2.6
14.77
$2.60
34
Kwaipilot: KAT Coder Pro V2
14.79
$0.43
35
Google: Gemini 3.5 Flash
15.00
$4.83
36
xAI: Grok 4.3
15.03
$0.56
37
xAI: Grok Build 0.1
16.42
$1.24
Total<br>$237.68
1 #1 and #2 tied with an AIMAC Debt of 0.00. Tiebreaker: #1 averaged fewer violations (0.91 vs 0.94).
* GPT 5.3 Codex shows a median AIMAC Debt of 0.00. This means at least half of the 28 categories had zero accessibility violations, but some categories still had minor issues (20 total violations across all categories).
Deep Dive
Analysis
Introduction
95.9% of the top million websites fail basic accessibility checks. WebAIM has tracked it for seven years. After six years of<br>marginal improvement, 2026 reversed the trend: errors per page jumped 10% to 56.1 and the failure rate<br>climbed back to 95.9%.
AI is writing more of the world's code every day. Vibe Coding was the Collins<br>Dictionary Word<br>of the Year. If AI keeps writing code as poorly as the developers it learned from, nothing changes. But if<br>it prioritizes accessibility, the web gets its first real chance to improve.
Our one ambitious goal
Ensure that AI models write accessible code by default.
Which Model is Best?
GPT 5.4 Mini by OpenAI is the new #1 model on AIMAC. It<br>achieves a median AIMAC Debt of 0.00 for $0.95 across all 28 categories, with 22 total accessibility<br>violations. GPT 5.3 Codex also has zero median debt and slightly fewer<br>total violations (20), but costs $3.02, so GPT 5.4 Mini is the stronger all-around pick.
The next tier is still OpenAI: GPT 5.5 ranks #3 with a debt of 3.32, GPT 5.5 Pro ranks #4 at 3.43, and open-weight gpt oss 120b ranks #5 at 3.85 for just $0.09.
GPT 5.4 Mini: #1
The Pareto Frontier
AIMAC Debt vs Cost
Choosing a model isn't simply about which model is most accessible. Some models are very<br>expensive. Benchmarks commonly use Pareto Frontier analysis to compare models on quality vs cost dimensions.<br>Pareto optimal models (teal diamonds) are the efficient picks: to lower the AIMAC Debt grade, you'd pay more; to<br>pay less, your AIMAC Debt grade rises. A gold ring marks the lowest AIMAC Debt.
Top 3 Winners
OpenAI
Alibaba/Qwen
Z.ai
1. OpenAI dominates the top of the leaderboard. Two of their models achieve a median AIMAC<br>Debt of 0.00: GPT 5.4 Mini (#1) and GPT 5.3<br>Codex (#2). A median of 0.00 means at least half of the 28 categories<br>had zero violations, though a few categories still had minor issues. OpenAI holds all five top spots, including GPT 5.5 (#3), GPT 5.5 Pro (#4), and open-weight<br>gpt oss 120b (#5) for just $0.09.
2. Alibaba/Qwen is the strongest non-OpenAI lab in this run. Qwen3.5 397B A17B ranks #6 with an AIMAC Debt of 4.09 for $0.76, and Qwen3 Coder Next ranks #10 at 4.54 for $0.27. Qwen also places Qwen3.7 Plus (#12) and Qwen3.6 Flash<br>(#13) just outside the top ten.
3. Z.ai has the most...