Pay-per-Crawl

Pay-Per-Crawl Is Splitting the Web in Two — FourA Blog

ESC

↑ ↓ navigate<br>Enter open<br>ESC close

On February 19, 2026, Stack Overflow and Cloudflare went public with something most of the web data industry didn't see coming. They co-launched pay-per-crawl: a system where AI crawlers get a real-time 402 Payment Required response and can either pay the publisher's price or walk away. Bot identity is verified at the edge, the price is set by the site, the transaction is metered.

Cloudflare sits in front of roughly one in five sites on the internet. So when they flipped block-by-default for known AI bots and stood up a marketplace where publishers charge per request, the access model for a huge slice of the open web changed in a weekend.

If you're shipping web data infrastructure right now, this isn't a Cloudflare announcement to file away. It changes the math on what "open" means.

The Mechanic Behind the Flip

The technical move is small. Cloudflare resurrected HTTP 402, the long-dormant "Payment Required" status code, and wired it to a registry of verified AI crawlers. A publisher sets a per-request price. The crawler either holds a credit balance and pays, or gets blocked.

The non-technical move is bigger. Before this, the only ways to enforce "don't scrape my content for AI" were robots.txt (advisory, not enforced) and aggressive bot blocking (binary, lossy, and full of false positives). Cloudflare added a third option: a price tag.

The economics of that third option run differently from the first two. Robots.txt costs nothing and gets ignored. Bot blocking costs you traffic from real users misclassified as bots. A price tag, by design, separates crawlers willing to pay from ones that aren't.

Who's Actually Charging

Stack Overflow was the launch partner because their training data is genuinely valuable and they were already negotiating bilateral deals with OpenAI and others. Cloudflare's marketplace generalized those bilateral deals into a registry the rest of the publisher world can plug into.

The list of who's followed grew fast. AWS shipped its own bot-monetization layer. Akamai built a parallel one. The pitch to publishers is straightforward: instead of one expensive lawsuit against an AI lab, get a revenue line that pays per request.

For now this is mostly the high-value content tier: documentation, news, technical Q&A, structured reference data. The long tail of the web (small ecommerce sites, regional listings, niche forums) sits behind no such gate and probably never will. Cloudflare's own bot management costs money to run, and pay-per-crawl is opt-in. It only pays for sites where a single page view is worth charging for.

What This Means for Web Data Pipelines

If you're building a pipeline that pulls from Stack Overflow, major news sites, or any of the publishers actively onboarding, your options narrow to three. Pay through the marketplace once your traffic is identifiable as an AI crawler. Switch to a licensed dataset where one exists. Or find the data somewhere it's still open.

Most teams will end up doing all three at different times. That's the practical reality. The web is splitting into licensed and open, and the boundary isn't drawn neatly along domain lines. The same publisher can have one section behind 402 and another section open. The same site can charge one crawler and ignore a research bot entirely.

We think the practical reaction for engineering teams looks like this. First, audit your sources. If a meaningful share of your pipeline pulls from Stack Overflow, Reddit, major news sites, or any of the dozen publishers visibly courting these deals, assume the access model will change within twelve months. Second, separate licensed sources from open ones inside your architecture early. A pipeline that treats every source identically is fragile when half of them start asking for money and the other half don't. Third, stop treating robots.txt as the only signal. The 402 response will mean something operationally even if your crawler isn't an AI agent. False positives are inevitable in a system this new.

This sits alongside the training-data compliance pressure from the EU AI Act, which already pushed teams toward provenance-tracked sources. Pay-per-crawl is the same pressure with a billing layer attached.

The Honest Take

A few things will trip people up. Cloudflare's identity verification rests on bots registering. Bots that don't register, or that look like residential traffic, don't trigger 402 at all. They hit the normal anti-bot stack instead. That's already the path most aggressive AI crawlers will take. So pay-per-crawl works for the bots that want to comply. The ones that don't were never going to honor robots.txt either.

The bigger shift might not be the marketplace itself. It's that "is this content available for AI training" became a question with a contractual answer instead of a robots.txt guess. Publishers can finally enforce. Crawlers can finally...

Pay-per-Crawl

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi