The AI Model Accessibility Checker

AIMAC Leaderboard | AIMAC, the AI Model Accessibility Checker

AIMAC: The AI Model Accessibility Checker

AI is writing more code than ever. But is it accessible to People with Disabilities ?

We prompted the top AI models to build web pages across 28 categories and audited them for accessibility. We published every generated page, side by side, so you can see how different models tackled the same design challenge. We even measured emdash usage.

AIMAC is an initiative by the GAAD Foundation, in partnership with ServiceNow. Updated Jun 20, 2026.

Leaderboard

Legend: 🅿️ Pareto Optimal 🏆 Lowest Debt

AIMAC Leaderboard showing model rankings by accessibility debt, violations, and cost

Rank ▲

Model ▲▼

AIMAC Debt ▲▼

lower = better

Total Cost ▲▼

in USD

🅿️ 🏆

OpenAI: GPT 5.4 Mini

0.00

$0.95

OpenAI: GPT 5.3 Codex

0.00

$3.02

OpenAI: GPT 5.5

3.32

$12.41

OpenAI: GPT 5.5 Pro

3.43

$128.11

🅿️

OpenAI: gpt oss 120b

3.85

$0.09

Qwen: Qwen3.5 397B A17B

4.09

$0.76

Z.ai: GLM 4.7 Flash

4.19

$0.10

Google: Gemini 3.1 Pro Preview

4.40

$4.16

MoonshotAI: Kimi K2.7 Code

4.48

$1.46

Qwen: Qwen3 Coder Next

4.54

$0.27

Anthropic: Claude Haiku 4.5

4.57

$2.30

Qwen: Qwen3.7 Plus

4.64

$0.61

Qwen: Qwen3.6 Flash

4.67

$0.58

Anthropic: Claude Opus 4.8

4.775

$9.33

NVIDIA: Nemotron 3 Super

4.778

$0.16

DeepSeek: DeepSeek V4 Flash

4.83

$0.10

Anthropic: Claude Fable 5

4.94

$22.33

Z.ai: GLM 5.2

4.95

$2.01

MiniMax: MiniMax M3

5.16

$0.73

DeepSeek: DeepSeek V4 Pro

5.43

$0.71

Qwen: Qwen3 Coder Plus

7.18

$0.62

Anthropic: Claude Opus 4.8 (Fast)

7.46

$19.52

Mistral: Mistral Medium 3.5

8.16

$1.65

Amazon: Nova 2 Lite

8.20

$0.43

Mistral: Mistral Large 3 2512

8.229

$0.43

Arcee AI: Trinity Large Thinking

8.230

$0.20

Mistral: Codestral 2508

8.66

$0.14

Qwen: Qwen3.7 Max

9.18

$1.79

Anthropic: Claude Sonnet 4.6

9.83

$12.64

Google: Gemma 4 31B

10.30

$0.17

31 🅿️

Google: Gemma 4 26B A4B

11.15

$0.06

Mistral: Mistral Small 4

12.92

$0.17

MoonshotAI: Kimi K2.6

14.77

$2.60

Kwaipilot: KAT Coder Pro V2

14.79

$0.43

Google: Gemini 3.5 Flash

15.00

$4.83

xAI: Grok 4.3

15.03

$0.56

xAI: Grok Build 0.1

16.42

$1.24

Total $237.68

1 #1 and #2 tied with an AIMAC Debt of 0.00. Tiebreaker: #1 averaged fewer violations (0.91 vs 0.94).

* GPT 5.3 Codex shows a median AIMAC Debt of 0.00. This means at least half of the 28 categories had zero accessibility violations, but some categories still had minor issues (20 total violations across all categories).

Deep Dive

Analysis

Introduction

95.9% of the top million websites fail basic accessibility checks. WebAIM has tracked it for seven years. After six years of marginal improvement, 2026 reversed the trend: errors per page jumped 10% to 56.1 and the failure rate climbed back to 95.9%.

AI is writing more of the world's code every day. Vibe Coding was the Collins Dictionary Word of the Year. If AI keeps writing code as poorly as the developers it learned from, nothing changes. But if it prioritizes accessibility, the web gets its first real chance to improve.

Our one ambitious goal

Ensure that AI models write accessible code by default.

Which Model is Best?

GPT 5.4 Mini by OpenAI is the new #1 model on AIMAC. It achieves a median AIMAC Debt of 0.00 for $0.95 across all 28 categories, with 22 total accessibility violations. GPT 5.3 Codex also has zero median debt and slightly fewer total violations (20), but costs $3.02, so GPT 5.4 Mini is the stronger all-around pick.

The next tier is still OpenAI: GPT 5.5 ranks #3 with a debt of 3.32, GPT 5.5 Pro ranks #4 at 3.43, and open-weight gpt oss 120b ranks #5 at 3.85 for just $0.09.

GPT 5.4 Mini: #1

The Pareto Frontier

AIMAC Debt vs Cost

Choosing a model isn't simply about which model is most accessible. Some models are very expensive. Benchmarks commonly use Pareto Frontier analysis to compare models on quality vs cost dimensions. Pareto optimal models (teal diamonds) are the efficient picks: to lower the AIMAC Debt grade, you'd pay more; to pay less, your AIMAC Debt grade rises. A gold ring marks the lowest AIMAC Debt.

Top 3 Winners

OpenAI

Alibaba/Qwen

Z.ai

1. OpenAI dominates the top of the leaderboard. Two of their models achieve a median AIMAC Debt of 0.00: GPT 5.4 Mini (#1) and GPT 5.3 Codex (#2). A median of 0.00 means at least half of the 28 categories had zero violations, though a few categories still had minor issues. OpenAI holds all five top spots, including GPT 5.5 (#3), GPT 5.5 Pro (#4), and open-weight gpt oss 120b (#5) for just $0.09.

2. Alibaba/Qwen is the strongest non-OpenAI lab in this run. Qwen3.5 397B A17B ranks #6 with an AIMAC Debt of 4.09 for $0.76, and Qwen3 Coder Next ranks #10 at 4.54 for $0.27. Qwen also places Qwen3.7 Plus (#12) and Qwen3.6 Flash (#13) just outside the top ten.

3. Z.ai has the most...

The AI Model Accessibility Checker

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level