Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings · GitHub
/" data-turbo-transient="true" />
Skip to content
-->
Search Gists
Search Gists
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Instantly share code, notes, and snippets.
smellslikeml/v7_paired_analysis_gist.md
Last active<br>June 30, 2026 17:18
Show Gist options
Download ZIP
Star
(1)
You must be signed in to star a gist
Fork
(0)
You must be signed in to fork a gist
Embed
Select an option
Embed<br>Embed this gist in your website.
Share<br>Copy sharable link for this gist.
Clone via HTTPS<br>Clone using the web URL.
No results found
Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c.js"></script>
" readonly="readonly" data-autoselect="true" data-target="primer-text-field.inputElement " aria-describedby="validation-a8450386-a1aa-45a5-8181-8719a28e5867" class="form-control FormControl-monospace FormControl-input FormControl-small rounded-left-0 rounded-right-0 border-right-0" type="text" name="gist-share-url-sized-down" />
Save smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c to your computer and use it in GitHub Desktop.
Embed
Select an option
Embed<br>Embed this gist in your website.
Share<br>Copy sharable link for this gist.
Clone via HTTPS<br>Clone using the web URL.
No results found
Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c.js"></script>
" readonly="readonly" data-autoselect="true" data-target="primer-text-field.inputElement " aria-describedby="validation-e7739875-69c5-4f6a-8085-1d6c661bc140" class="form-control FormControl-monospace FormControl-input FormControl-small rounded-left-0 rounded-right-0 border-right-0" type="text" name="gist-share-url-original" />
Save smellslikeml/36bf4939d76f0f84d113e2ddde5e6d3c to your computer and use it in GitHub Desktop.
Download ZIP
Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings
Raw
v7_paired_analysis_gist.md
GLM Tries, Opus Triages: Behavioral Differences in Research-to-Code Agents
A controlled comparison across 19 paired runs spanning 19 repository forks — 38 individual workflow executions total — running an identical paper-implementation pipeline (remyxai/outrider — Claude Code under the hood, with glm-5.2 routed at z.ai's Coding Plan endpoint vs default Opus). The pipeline ran in two modes that probe different parts of the workflow:
Selection-pass mode (n=9) : no pin; each provider freely selects its own paper from the candidate pool. Exercises the full pipeline including selection + verification gates.
Pin-method mode (n=10) : same paper pinned on each fork, both providers run their full chain on identical input. Isolates implementation-side behavior on a forced pick.
The aggregate verdict comes from the n=19 union; the two mode-specific breakdowns below show where the difference comes from.
Reproducing this
The action under test is remyxai/outrider; the remyxai-cli installs the workflow on a target fork and dispatches runs:
# Install Outrider on the target fork (one-time setup).<br>remyxai outrider init --repo your-fork/repo --interest-id uuid>
# Drop the alternate provider's API key into the repo's secrets.<br>remyxai outrider set-provider-secret \<br>--repo your-fork/repo --provider zai --key-from ~/zai-key
# Compare the same paper across providers + models.<br>remyxai outrider trigger --repo your-fork/repo --pin-method 2606.27369v1 \<br>--provider anthropic --model claude-opus-4-7<br>remyxai outrider trigger --repo your-fork/repo --pin-method 2606.27369v1 \<br>--provider zai --model glm-5.2
# Or omit --pin-method to let each provider select its own paper.<br>remyxai outrider trigger --repo your-fork/repo \<br>--provider anthropic --model claude-opus-4-7<br>remyxai outrider trigger --repo your-fork/repo \<br>--provider zai --model glm-5.2
--provider picks the company / API endpoint; --model picks the specific model from that provider's catalog.
Headline finding: triage vs attempt
Aggregate outcomes across all 19 paired runs:
PR shipped<br>Issue filed<br>Skipped (verification)<br>Failed
Opus<br>5 / 19 (26%)<br>10 / 19 (53%)<br>4 / 19 (21%)
GLM-5.2<br>1 / 19 (5%)<br>15 / 19 (79%)<br>2 / 19 (11%)<br>1 / 19 (5%)
Opus triages. When it can ship a PR cleanly, it does (5× more often than GLM). When it can't find a real call site, it exits at selection-pass verification rather than attempting an implementation. The full range of routing outcomes — PR / Issue / skip — gets used roughly in proportion to what the candidate actually warrants: ship, surface for discussion, or drop.
GLM-5.2 tries. GLM rarely exits early — it attempts implementation, and Outrider's...