llms.txt vs sitemap.xml: What Each File Does for AI Discovery← Back to blog/Engineering<br>llms.txt vs sitemap.xml: What Each File Does for AI Discovery<br>sitemap.xml tells crawlers what exists. llms.txt tells AI agents what matters. If you run docs in 2026, you probably want both.<br>Faizan Khan<br>2026-06-01 • 8 min read
If you're asking whether llms.txt replaces sitemap.xml, you're asking the wrong question. They solve different problems.
sitemap.xml is about completeness. It tells crawlers which URLs exist.
llms.txt is about judgment. It tells AI agents which pages are worth reading first.
That distinction matters because documentation discovery is no longer just "can Google find this page?" It is also "if an agent needs to solve a task, where should it start?"
For most docs teams, the right answer is simple: keep your sitemap, add llms.txt, and do not try to make one impersonate the other.
If you need the basics first, start with What Is llms.txt? A Practical Guide for SaaS Docs Teams. If you want examples of good llms.txt structure, read llms.txt Examples: Real Patterns for API Docs, Help Centers, and Developer Docs.
What sitemap.xml Does
sitemap.xml is a crawler inventory.
Its job is straightforward:
list URLs
optionally include lastmod
help search engines discover pages
help search engines prioritize crawling
A typical sitemap entry looks like this:
XML<br>Copy<br>1url><br>2 loc>https://docs.example.com/authenticationloc><br>3 lastmod>2026-05-30lastmod><br>4url>
There is no editorial intent here. The sitemap is not trying to say "read this page first" or "these are the three pages that matter most for API onboarding." It is just telling a crawler what exists.
That is exactly what it should do.
For Google and traditional search engines, this is useful. For AI agents trying to answer a question or complete a task, it is often not enough.
What llms.txt Does
llms.txt is a curated docs guide for agents.
Its job is different:
explain what the docs set covers
point to the most important pages
group links by real tasks or concepts
reduce ambiguity about where an agent should start
A typical llms.txt section looks like this:
Markdown<br>Copy<br>1# Acme API Docs
3Developer documentation for Acme's REST API. Covers authentication,<br>4webhooks, rate limits, and SDK setup.
6## Start Here
8- [Quickstart](https://docs.acme.com/quickstart): Make your first request<br>9- [Authentication](https://docs.acme.com/authentication): API keys and OAuth<br>10- [Errors](https://docs.acme.com/errors): Error codes and retry guidance
This is not a full site inventory. It is a small, opinionated map.
That makes it useful in the exact places where a sitemap is weak:
"what page should I read first?"
"which auth path is canonical?"
"where are the webhook docs?"
"what should I look at before writing code?"
The Real Difference: Completeness vs Curation
The difference is not XML versus Markdown.
The real difference is this:
sitemap.xml optimizes for completeness
llms.txt optimizes for usefulness
That one distinction explains most of the confusion.
If you generate llms.txt straight from your sitemap, you usually lose the thing that makes llms.txt valuable. You get a second, worse sitemap.
If you try to use a sitemap as a task guide, you get a giant URL dump with no editorial signal.
They overlap in the broad sense that both help discovery. But they help different kinds of discovery.
Side-by-Side
Here is the simplest way to think about them:
sitemap.xmlllms.txtPrimary audiencesearch engine crawlersAI agents and toolsFormatXMLplain text / MarkdownMain purposelist what existspoint to what mattersCoverageexhaustiveselectiveMaintenance stylegeneratedcuratedBest forcrawl discoverytask-oriented docs guidanceBad attelling agents where to startrepresenting every page on the site
That table is more useful than arguing about whether one is "better."
They are not substitutes. They sit at different layers.
When a Sitemap Is Enough
For some jobs, sitemap.xml is enough.
If your goal is:
making sure Google can discover your pages
exposing a large docs surface to traditional crawling
tracking freshness through lastmod
helping search engines notice newly published docs
then the sitemap is doing exactly what you need.
A sitemap is also better whenever completeness matters more than editorial guidance.
For example:
versioned docs with lots of pages
large reference surfaces
generated API docs
In those cases, you still want the sitemap even if you also publish llms.txt.
When llms.txt Changes the Outcome
llms.txt matters when an agent needs to do more than just discover pages.
It matters when the agent needs help choosing.
Examples:
your docs have both OAuth and API key auth, but one is the recommended default
your product has three SDKs, but most users should start with one
your help center has 200 pages, but only 8 solve most support tasks
your developer docs have architecture pages that matter before implementation
These...