Cookie banners make vision models hallucinate webpage content

chrmod1 pts0 comments

Cookie banners make vision models hallucinate webpage content — chrmod / search ~/chrmod / archive / cookie-banners-vision-models<br>Cookie banners make vision models hallucinate webpage content<br>2026·05·21 ┄ 10 min read ┄ tags: ai · vision · browsers · ghostery · scraping

Cookie banners don't just block vision models—they cause them to hallucinate

2026-05-21. 9 US news sites × 2 variants × 10 trials = 180 calls. Anthropic Sonnet 4.5 vision API. Methodology at the bottom.

When a cookie banner is on screen at the moment you screenshot a webpage and feed the screenshot to a vision model, the model often doesn't just fail to extract the headlines — it confidently invents plausible-sounding ones. Same page, same model, banner removed: it gets the headlines right.

Across 180 API calls — 9 US news sites, vanilla Chrome vs. Chrome + Ghostery (autoconsent strips the banner), 10 trials per cell — the vanilla model returned 30/90 empty articles arrays (33%) . Ghostery returned 0/90 empty (0%) . On 5 of 9 sites (theverge, npr, slate, vox, huffpost), the median vanilla trial extracts zero real headlines — either empty, or confabulated entries that don't match anything on the page. On all 9 sites Ghostery extracts real headlines on every trial.

Takeaway, up front: if you ship a vision-based scraping pipeline or browser agent (Operator, Comet, Arc, browser-use, Skyvern, anything that screenshots a page and sends it to an LLM), an autoconsent/adblocker layer is a mandatory pre-processing step. Downstream code can't tell hallucinated headlines from real ones because the JSON is well-formed either way.

See it happen: slate.com

Vanilla Chrome

articles: 0 · output: 337 tok · zero_articles

Model output (truncated):

"articles": [],<br>"navigation": [<br>"News & Politics",<br>"Culture",<br>"Technology",<br>"Business",<br>"Life",<br>"Advice",<br>"Podcasts"<br>],<br>"other": [<br>"$5 Sale",<br>"Slate Shop",<br>"Games",<br>"Newsletters",<br>"Sign In",<br>"We Care About Your Privacy",<br>"We and our partners process data to provide:",<br>"We and our 340 partners store and access personal data, like browsing data or unique identifiers, on your device...",<br>"Use precise geolocation data. Actively scan device characteristics for identification..."<br>...

Chrome + Ghostery

articles: 8 · output: 339 tok · ok

Model output (truncated):

"articles": [<br>"The Latest Literary Controversy Is All About Two Pages Out of Hundreds. I Read the Whole Thing.",<br>"America's Hottest Primary Has Clues About What's Coming After Trump. It's Not Pretty.",<br>"Maybe Trump Is Giving Us the Capital City We Deserve",<br>"Trump's Efforts to Intimidate Elon Musk's Enemies Just Faced a Severe Setback",<br>"Justice Jackson Seems to Be Warning Us About the Supreme Court's Next Voting Rights Target",<br>"Is the Supreme Court's Abortion Pill Ruling Actually a Feint?",<br>"Play Today's Pears 🍐",<br>"Slate Mini Crossword for May 20, 2026"<br>],<br>"navigation": [<br>"$5 Sale",<br>"Slate Shop",<br>"Games",<br>"Newsletters",<br>"Sign In",<br>"News & Politics",<br>"Culture",<br>"Technology",<br>"Business",<br>"Life",<br>"Advice",<br>"Podcasts"<br>],<br>"other": [<br>"EXPERTS AND AESTHETES",<br>"Search",<br>"Light",<br>"LAURA MILLER",<br>"BEN JACOBS",<br>"TRUMPWORLD",<br>"THE LAW",<br>"SLATE GAMES"

Same Slate homepage, same Anthropic Sonnet 4.5 vision model, same prompt — 10 trials on each variant. Vanilla Chrome: the model returned an empty array on 7 of 10 trials, fabricated content on the other 3. Chrome + Ghostery: the model returned the 6 actual top story headlines on all 10 trials.

What the model hallucinates vs. what's actually on the page

Three illustrative pairs from this run. Vanilla is on the left (fabricated entries from sites where the model returned something under the consent modal). Ghostery is on the right (what the actual top story was, recovered consistently across 10 trials with no banner in the way).

SiteVanilla returned (fabricated)Actual top story (Ghostery returned)

npr.org"Defunding public broadcasting doesn't return money to taxpayers""Here's how Tuesday's primary elections played out, state by state"<br>theverge.com"Honda's new EV is a rebranded GM Blazer""Valve says games like Vampire Survivors fall under the 'Bullet Heaven' genre"<br>usatoday.com"New Google CEO Amar Pichawalla: What to know about the new leader""US-Cuba tensions escalate amid Raúl Castro indictment: Updates"

Vanilla returns syntactically valid JSON containing those fake titles. Sundar Pichai isn't named "Amar Pichawalla." NPR's top story isn't about defunding public broadcasting. theverge's top story isn't about Hondas. A pipeline parsing this output has no signal that anything is wrong.

More fabrications from the same run:

usatoday vanilla also returned "Police say man set three people on fire in Chicago" and "Eric Trump arrested on charges tied to Jan. 6? No, this is a fake photo." Neither was on the page.

theguardian vanilla repeatedly returned three close-but-different paraphrases of a Stephen Colbert headline ("hosts bid goodbye", "hosts say goodbye", "bids a complicated goodbye") — the real headline is a...

model vision vanilla returned ghostery slate

Related Articles