Comparing GLM 5.2 and Opus 4.8 implementing the same methods for the same repos

mynameisfunk1 pts0 comments

Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings · GitHub

/" data-turbo-transient="true" />

Skip to content

-->

Search Gists

Search Gists

Sign in

Sign up

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

Instantly share code, notes, and snippets.

smellslikeml/v7_paired_analysis_gist.md

Last active<br>June 30, 2026 17:18

Show Gist options

Download ZIP

Star

(1)

You must be signed in to star a gist

Fork

(0)

You must be signed in to fork a gist

Embed

Select an option

Embed<br>Embed this gist in your website.

Share<br>Copy sharable link for this gist.

Clone via HTTPS<br>Clone using the web URL.

No results found

Learn more about clone URLs

Clone this repository at &lt;script src=&quot;https://gist.github.com/smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c.js&quot;&gt;&lt;/script&gt;

" readonly="readonly" data-autoselect="true" data-target="primer-text-field.inputElement " aria-describedby="validation-a8450386-a1aa-45a5-8181-8719a28e5867" class="form-control FormControl-monospace FormControl-input FormControl-small rounded-left-0 rounded-right-0 border-right-0" type="text" name="gist-share-url-sized-down" />

Save smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c to your computer and use it in GitHub Desktop.

Embed

Select an option

Embed<br>Embed this gist in your website.

Share<br>Copy sharable link for this gist.

Clone via HTTPS<br>Clone using the web URL.

No results found

Learn more about clone URLs

Clone this repository at &lt;script src=&quot;https://gist.github.com/smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c.js&quot;&gt;&lt;/script&gt;

" readonly="readonly" data-autoselect="true" data-target="primer-text-field.inputElement " aria-describedby="validation-e7739875-69c5-4f6a-8085-1d6c661bc140" class="form-control FormControl-monospace FormControl-input FormControl-small rounded-left-0 rounded-right-0 border-right-0" type="text" name="gist-share-url-original" />

Save smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c to your computer and use it in GitHub Desktop.

Download ZIP

Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings

Raw

v7_paired_analysis_gist.md

GLM Tries, Opus Triages: Behavioral Differences in Research-to-Code Agents

A controlled comparison across 19 paired runs spanning 19 repository forks — 38 individual workflow executions total — running an identical paper-implementation pipeline (remyxai/outrider — Claude Code under the hood, with glm-5.2 routed at z.ai's Coding Plan endpoint vs default Opus). The pipeline ran in two modes that probe different parts of the workflow:

Selection-pass mode (n=9) : no pin; each provider freely selects its own paper from the candidate pool. Exercises the full pipeline including selection + verification gates.

Pin-method mode (n=10) : same paper pinned on each fork, both providers run their full chain on identical input. Isolates implementation-side behavior on a forced pick.

The aggregate verdict comes from the n=19 union; the two mode-specific breakdowns below show where the difference comes from.

Reproducing this

The action under test is remyxai/outrider; the remyxai-cli installs the workflow on a target fork and dispatches runs:

# Install Outrider on the target fork (one-time setup).<br>remyxai outrider init --repo your-fork/repo --interest-id uuid>

# Drop the alternate provider's API key into the repo's secrets.<br>remyxai outrider set-provider-secret \<br>--repo your-fork/repo --provider zai --key-from ~/zai-key

# Compare the same paper across providers + models.<br>remyxai outrider trigger --repo your-fork/repo --pin-method 2606.27369v1 \<br>--provider anthropic --model claude-opus-4-7<br>remyxai outrider trigger --repo your-fork/repo --pin-method 2606.27369v1 \<br>--provider zai --model glm-5.2

# Or omit --pin-method to let each provider select its own paper.<br>remyxai outrider trigger --repo your-fork/repo \<br>--provider anthropic --model claude-opus-4-7<br>remyxai outrider trigger --repo your-fork/repo \<br>--provider zai --model glm-5.2

--provider picks the company / API endpoint; --model picks the specific model from that provider's catalog.

Headline finding: triage vs attempt

Aggregate outcomes across all 19 paired runs:

PR shipped<br>Issue filed<br>Skipped (verification)<br>Failed

Opus<br>5 / 19 (26%)<br>10 / 19 (53%)<br>4 / 19 (21%)

GLM-5.2<br>1 / 19 (5%)<br>15 / 19 (79%)<br>2 / 19 (11%)<br>1 / 19 (5%)

Opus triages. When it can ship a PR cleanly, it does (5× more often than GLM). When it can't find a real call site, it exits at selection-pass verification rather than attempting an implementation. The full range of routing outcomes — PR / Issue / skip — gets used roughly in proportion to what the candidate actually warrants: ship, surface for discussion, or drop.

GLM-5.2 tries. GLM rarely exits early — it attempts implementation, and Outrider's...

repo gist fork provider outrider opus

Related Articles