I made the same AI compete against itself in SEO tasks

I made the same AI compete against itself in SEO tasks | by Adam Gałęcki | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

The only difference was the live Google data connection.

Adam Gałęcki

7 min read· 1 hour ago

Listen

The problem with “AI vs SEO tools” comparisons Most of them change two things at once. They run one model on one data source, a different model on a different source, then declare a winner. When both the model and the inputs change, you learn nothing about either. Was the difference the reasoning, the training data, or the retrieval layer? You can’t tell. So I changed only one thing. The method Both sides were the same model — claude-opus-4-8 (fable5 was my first choice, but…): same weights, same prompts, same niche, same market. The only difference: Side A (tools) could pull live Google data through NodesHub’s SERP API. Side B (naked) only has its own memory. Since both sides are the same model, any gap comes from the data, not from one being smarter. I ran the naked side first, before pulling any live SERP. The tools side is live Google data, so it’s correct by definition and acts as the answer key. Scoring was strict: a naked keyword counts as verified only if the exact phrase (or a close variant) actually appears in the live data. The setup Press enter or click to view image in full size

Nodeshub SEO OS with claude code in terminalNiche: SERP data APIs. Seed keyword: "serp data api" . Market: US, English. Tasks: keyword research, SERP intent, People Also Ask, query fan-out. Verification: live Nodeshub data, plus manual US/incognito Google autocomplete for fan-out queries the API couldn’t confirm. [Task 1] Keyword research Naked model: 40 keywords. Tools side: 411 keywords from live data. The naked model gave me 40 keywords, and the tools side mined 411 from live data. When I graded the 40 against the 411 with strict matching, only 17 of them held up, which works out to 42%. The misses weren’t nonsense. Keywords like “featured snippet api” or “local pack api” sound completely reasonable, but nobody actually types them. The model also threw in vendor names like zenserp, valueserp and oxylabs as if they were keywords. Those are real companies, but they aren’t real searches. Press enter or click to view image in full size

The live data also caught something I’d have missed. “SERP” is also a finance term, short for Supplemental Executive Retirement Plan, and 24 of the 411 keywords turned out to be about 401ks, pensions and Roth IRAs, with queries like “is serp better than 401k?”. The seed keyword is a homonym that points at a completely different audience, and only the live data showed me that. [Task 2] SERP intent I asked both sides to predict the intent, the SERP features and the top domains for three keywords. On intent, the naked model got all three right, reading them correctly as commercial-investigation queries. On features, it fell apart. It predicted Google Ads on all three SERPs, but real Google showed no ads at all, and instead displayed AI Overviews and a Perspectives Carousel, neither of which the model mentioned. Press enter or click to view image in full size

SERP results without Ads.The domains had the same problem. The model named 9 real ones, but it also made up 6 of them (zenserp, valueserp, oxylabs, GitHub, G2 and scrapingbee), and it missed the actual leaders completely: nimbleway, trajectdata, scrapingdog, searchapi.io, serper.dev, scrapfly, proxyway and olostep. It did get one thing right, correctly guessing that Reddit ranks for “best serp api”, where it sits at #2. [Task 3] People Also Ask Naked model: 15 questions. Tools side: 8 real ones from live PAA boxes. The naked model generated 15 questions, while the tools side returned 8 real ones from the live PAA boxes. The exact seed “serp data api” returned no PAA at all, because it’s a sparse term, which is something the naked model had actually flagged itself. Press enter or click to view image in full size

Of the 15 questions, only 3 matched the real 8 in theme, covering the free, legal and what-is angles. The other 12 were made up. And the real PAA had questions the model never came up with, like “What does SERP API stand for?”, “What is a serper API?” and “What is the fastest SERP API?”. So the verification rate here was 3 out of 15, or 20%. [Task 4] Query fan-out (where the naked model wins) Naked model: 30 queries. Tools side: 18 queries. The naked model produced 30 queries, while the tools side produced 18, made up of live AI variants plus confirmed Related Searches. When I checked the 30 through both channels, against live data and manual autocomplete, the naked model scored 24 of 30, or 80%.

Press enter or click to view image in full size Press enter or click to view image in full size

Press enter or click to view image in full size Press enter or click to view image in full...

I made the same AI compete against itself in SEO tasks

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y