Two-Thirds of the Web Is Invisible to AI Bots | by Gregory Pellitteri | May, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Two-Thirds of the Web Is Invisible to AI Bots
Gregory Pellitteri
5 min read·<br>4 days ago
Listen
Share
← Back to Blog<br>When AI search engines became a meaningful share of how people find businesses online, a quiet assumption traveled with the trend. The assumption was that if your site shows up in Google, it will probably also show up in ChatGPT. They run on the same web. They read the same pages. Surely the underlying access pattern is similar.<br>It is not.<br>We analyzed about 1.2 million brand websites in our research dataset. Of those, only around 395,000 have ever been crawled by a major AI bot. That is a base crawl rate of 33.3 percent. The other 66.7 percent are completely invisible to ChatGPT, Claude, Perplexity, and the other large language model assistants that millions of consumers now use as a primary discovery channel.<br>Press enter or click to view image in full size
67% never crawled, 33% crawled at least once. Source: Engagemii research, 1.2M brands, 2026–05–29.<br>If you assumed your site was being read, the math says there is a two-in-three chance you are wrong.<br>Why the gap exists<br>Traditional search engines built their indexes during a 20-year era when the dominant problem was scale: there were billions of pages and the search engine needed to crawl all of them quickly. Googlebot and Bingbot are aggressive crawlers because their job is to know about everything, then sort it.<br>AI bots have a different job. They do not have to know about every site. They have to know about the sites worth quoting in an answer. The cost of crawling a low-quality, machine-unreadable site is the same as crawling a good one, but the value is zero. So AI bot operators are far pickier about what they index. If your site does not present the structured signals that say “I am a real organization with verifiable information,” the AI bot moves on and rarely comes back.<br>The shorthand for this is that AI bots crawl on quality, not coverage. Googlebot crawls on coverage. Different incentives produce different access patterns. The 67% of sites that are invisible to AI bots are the sites that pass Googlebot’s much lower bar but fail the AI bots’ higher one.<br>The structured-data effect<br>We trained a classifier on our dataset to predict, from structured-data signals alone, whether a site is in the AI-visible third. The model holds all other variables aside (no AEO score, no domain popularity, no business category) and turns 16 structured-data signal flags on and off.<br>Sites with none of those signals are crawled at 27.2 percent. Sites with all of them are crawled at 57.0 percent. The multiplier between the two ends is 2.09 times. The Stage A model AUC is 0.612, meaningfully better than chance and consistent with the descriptive cuts in the data.<br>What this says in plain English: if your site has the structured-data signals AI bots look for, you are roughly twice as likely to be in the visible third than if you do not. The signals are not exotic. They are the same JSON-LD Organization schema, FAQ schema, Article schema, and Product schema that have been standard practice in technical SEO for a decade. The difference is that AI bots actually use them as a gate, while traditional search engines treat them as a tiebreaker.<br>The 67% problem in practical terms<br>If you are a local business owner, the invisible majority is your competitive set. Most of your competitors are in it. That is good news. It means your AEO investment buys you outsized differentiation cheaply.<br>If you are a marketing leader at a larger company, the invisible majority is harder to think about. You probably assumed your site was visible because it ranks well in Google. The two are decoupled now. You can rank position 1 for your category and still not appear in a Perplexity answer for the same query. Your AI search visibility is governed by a different set of signals than your blue-link visibility, and the signals require explicit attention.<br>The hardest version of the 67% problem is brands whose robots.txt explicitly blocks AI crawlers. We see this constantly in audits. Years ago, well-meaning developers added User-agent blocks for GPTBot, CCBot, and Google-Extended out of caution about content scraping. Those blocks were rational in 2023. They are now actively preventing the brand from appearing in AI answers, because AI assistants will not cite a source they cannot reach.<br>If you have not audited your robots.txt in the last 12 months, it is likely that someone on your team made a defensive decision that has since become an offensive disadvantage. This is the single fastest thing to check.<br>How to find out which side you’re on<br>The free AEO score at engagemii.com/aeo includes a direct check on AI crawler access. The audit returns a yes-or-no answer on whether GPTBot, ClaudeBot, PerplexityBot,...