How GLM-5.2 beat Fable 5 At Website Design

anmaygu1 pts0 comments

How GLM-5.2 Beat Fable 5 at Website Design

GLM 5.2 ranks 1st overall on Design Arena’s single-turn, HTML Web Design (Non-Agentic) evaluation, 5 places higher than its predecessor GLM-5.1. To do so, it beat Claude Fable 5, Opus 4.6, and Opus 4.7, a model line that has held the top spots for months on our leaderboards and won more head-to-head matchups than any other model we track. This is the first model to do so, and it’s MIT licensed. Especially impressive is that Z.ai has achieved this result with a model that is the same size as GLM-5.1 at 744 billion parameters, without vision capabilities, while its nearest competitors are speculated to be as much as 6.7x larger.

GLM 5.2 also establishes a new Pareto frontier for preference vs price at $1.40/$4.40 per 1 milllion tokens , versus Claude Fable 5’s $10 / $50 per 1 million tokens.<br>GLM-5.2 doesn’t surpass Fable 5 in all tasks. It’s ranked second place after Fable 5 on our Game Dev, Data Visualization, and 3D design leaderboards, and 4th place on our UI Component leaderboard.<br>What changed in GLM-5.2’s website outputs?<br>To answer this question, we run a case-by-case analysis on single-turn deployments of GLM-5.2 and observe how its optimizations have improved its performance across frontend coding tasks. This allows us to determine not only which optimizations are most effective, but also which error cases the model avoids.The overarching takeaway is that GLM-5.2 avoids common error cases that most AI models fail to handle, generates more intricate websites, and specializes in designing structures that users prefer over other results.<br>Model Behavior #1: Outputs seem to indicate a beautiful set of starting templates<br>We can see why the Web Dev leaderboard is notable by looking at 1000 randomly sampled websites generated by both GLM-5.2 and Fable 5. This lets us see if a model produces similar designs for different prompts by screenshotting each of the generated websites and grouping by similarity. Below, we can see a visualization of this for GLM-5.2.<br>If we zoom in, we find that GLM-5.2 has a tendency to produce templated, similar responses even if the prompts are very different.

This is normal for frontier-level models, and it’s the result of many factors from a model’s architecture down to its training data. While the templates aren’t visible in day-to-day work and random activity, when aggregated and compared, they can come to light. The difference with GLM-5.2 is that the templates it uses perform much better than those of other frontier models, causing it to outperform many of its peers, as its templates don’t contain antipatterns like the infamous purple gradients that plagued early AI models.<br>Compare this to Fable 5, whose outputs are more widespread than GLM-5.2’s. It's more difficult to find exact templates like the ones in GLM-5.2.<br>This is indicative of Fable 5 being a more general model that produces a wider array of diversified outputs.

This customized technique does not seem to perform better on website generation just yet; the "expert template" base method employed by GLM 5.2 is favored among users as being a higher bar for the average output quality.<br>Model Behavior #2: Avoids common error cases<br>Much of GLM-5.2’s improvement can be explained by the fact that it generates code that... just works. This can be most clearly seen in GLM-5.2’s use of dependencies, such as chart.js and three.js. While other models often fail to effectively use these libraries, GLM-5.2 calls and uses them naturally, resulting in a 6.0 percentage point win rate increase for the 21% of sessions that use them.

The usage of these libraries is especially helpful for the Dashboard and 3D Design categories, where the usage of these libraries dramatically improves performance.

It also uses TailwindCSS in 91% of sessions, and font-awesome in 51% to increase win rates by 1.2 percentage points by crafting intricate design interactions and websites. Compare this to Opus 4.8, which only uses TailwindCSS in 57% of sessions and potentially sees a drop because of it.

0:00

/0:05

GLM-5.2 also has improved layout skills, especially when it comes to above-the-fold design. It generally makes use of beautiful outside CDN images instead of building its own visuals, and also has an improved sense of layout over its other competitors.

GLM-5.2’s ability to use outside dependencies is crucial for improving its performance in Design Arena, as it avoids the error cases that cause other models to fall behind.<br>Model Behavior #3: More intricate, detailed outputs<br>GLM-5.2 also generates animated, elaborate websites with more variation in typography, visual layout, and animation. These perform especially well for marketing and landing page websites, crafting customized user experiences that feel thoughtful and well-designed.

0:00

/0:05

This strategy comes at the cost of longer generation times as the model outputs more tokens, making it slower but also producing immensely more...

model fable design outputs models websites

Related Articles