Comparing GLM 5.2 and Opus 4.8 implementing the same methods for the same repos

Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings · GitHub

/" data-turbo-transient="true" />

-->

Search Gists

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Instantly share code, notes, and snippets.

smellslikeml/v7_paired_analysis_gist.md

Last active June 30, 2026 17:18

Show Gist options

Download ZIP

Star

(1)

You must be signed in to star a gist

Fork

(0)

You must be signed in to fork a gist

Embed

Select an option

Embed Embed this gist in your website.

Share Copy sharable link for this gist.

Clone via HTTPS Clone using the web URL.

No results found

Learn more about clone URLs

Clone this repository at <script src="https://gist.github.com/smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c.js"></script>

" readonly="readonly" data-autoselect="true" data-target="primer-text-field.inputElement " aria-describedby="validation-a8450386-a1aa-45a5-8181-8719a28e5867" class="form-control FormControl-monospace FormControl-input FormControl-small rounded-left-0 rounded-right-0 border-right-0" type="text" name="gist-share-url-sized-down" />

Save smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c to your computer and use it in GitHub Desktop.

Embed

Select an option

Embed Embed this gist in your website.

Share Copy sharable link for this gist.

Clone via HTTPS Clone using the web URL.

No results found

Learn more about clone URLs

Clone this repository at <script src="https://gist.github.com/smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c.js"></script>

" readonly="readonly" data-autoselect="true" data-target="primer-text-field.inputElement " aria-describedby="validation-e7739875-69c5-4f6a-8085-1d6c661bc140" class="form-control FormControl-monospace FormControl-input FormControl-small rounded-left-0 rounded-right-0 border-right-0" type="text" name="gist-share-url-original" />

Save smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c to your computer and use it in GitHub Desktop.

Download ZIP

Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings

Raw

v7_paired_analysis_gist.md

GLM Tries, Opus Triages: Behavioral Differences in Research-to-Code Agents

A controlled comparison across 19 paired runs spanning 19 repository forks — 38 individual workflow executions total — running an identical paper-implementation pipeline (remyxai/outrider — Claude Code under the hood, with glm-5.2 routed at z.ai's Coding Plan endpoint vs default Opus). The pipeline ran in two modes that probe different parts of the workflow:

Selection-pass mode (n=9) : no pin; each provider freely selects its own paper from the candidate pool. Exercises the full pipeline including selection + verification gates.

Pin-method mode (n=10) : same paper pinned on each fork, both providers run their full chain on identical input. Isolates implementation-side behavior on a forced pick.

The aggregate verdict comes from the n=19 union; the two mode-specific breakdowns below show where the difference comes from.

Reproducing this

The action under test is remyxai/outrider; the remyxai-cli installs the workflow on a target fork and dispatches runs:

# Install Outrider on the target fork (one-time setup). remyxai outrider init --repo your-fork/repo --interest-id uuid>

# Drop the alternate provider's API key into the repo's secrets. remyxai outrider set-provider-secret \ --repo your-fork/repo --provider zai --key-from ~/zai-key

# Compare the same paper across providers + models. remyxai outrider trigger --repo your-fork/repo --pin-method 2606.27369v1 \ --provider anthropic --model claude-opus-4-7 remyxai outrider trigger --repo your-fork/repo --pin-method 2606.27369v1 \ --provider zai --model glm-5.2

# Or omit --pin-method to let each provider select its own paper. remyxai outrider trigger --repo your-fork/repo \ --provider anthropic --model claude-opus-4-7 remyxai outrider trigger --repo your-fork/repo \ --provider zai --model glm-5.2

--provider picks the company / API endpoint; --model picks the specific model from that provider's catalog.

Headline finding: triage vs attempt

Aggregate outcomes across all 19 paired runs:

PR shipped Issue filed Skipped (verification) Failed

Opus 5 / 19 (26%) 10 / 19 (53%) 4 / 19 (21%)

GLM-5.2 1 / 19 (5%) 15 / 19 (79%) 2 / 19 (11%) 1 / 19 (5%)

Opus triages. When it can ship a PR cleanly, it does (5× more often than GLM). When it can't find a real call site, it exits at selection-pass verification rather than attempting an implementation. The full range of routing outcomes — PR / Issue / skip — gets used roughly in proportion to what the candidate actually warrants: ship, surface for discussion, or drop.

GLM-5.2 tries. GLM rarely exits early — it attempts implementation, and Outrider's...

Comparing GLM 5.2 and Opus 4.8 implementing the same methods for the same repos

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level