AI search
What is llms.txt? The new file AI crawlers actually read
A markdown file at your domain root that tells AI engines which pages matter. Why it exists, who proposed it, and whether to bother.
The short answer
llms.txt is a single markdown file you place at the root of your domain, like robots.txt or sitemap.xml. The contents are a human-readable guide to your most important content, structured for AI crawlers to consume. The specification is at llmstxt.org.
Where robots.txt tells crawlers which pages they may or may not access, and sitemap.xml lists every page on the site, llms.txt curates. It tells AI engines which pages on your site you consider authoritative, what each page covers, and how the pages relate to each other. The format is markdown (humans can read it) but optimized for the way LLMs actually parse content (clear headings, descriptive link text, brief context for each entry).
The file was proposed by Jeremy Howard, co-founder of Answer.AI and author of the fast.ai courses, in September 2024. The full specification lives at llmstxt.org. Adoption through 2025 was steady but quiet. By 2026, several major frameworks (Mintlify, Webflow, Next.js documentation generators) ship support out of the box, and a growing share of major SaaS sites publish one. Empirical studies, including a 2025 SE Ranking analysis of 2,500 sites, have shown mixed citation impact, suggesting llms.txt is one of many signals rather than a single lever.
Why llms.txt exists
AI engines have a content-discovery problem that traditional crawlers do not.
A traditional crawler reads HTML, follows links, and indexes pages. The signal that a page matters comes from inbound links and on-page authority markers. The crawler does not need a curated map; it can infer importance from the link graph.
AI engines retrieve content differently. They embed pages, retrieve them by semantic similarity to a query, then synthesize an answer. The retrieval step works better when the content is clean, factually dense, and clearly attributed. Pages buried under three layers of nav, JavaScript-rendered content, and aggressive cookie banners are noisier to ingest than pages handed over in clean markdown.
llms.txt addresses the noise. It hands the AI engine a clean, curated index of what you consider authoritative, with brief descriptions and direct links. The crawler can then prioritize those pages for retrieval and embed them with less ambient noise.
The proposal also includes an optional convention: each linked page in llms.txt may have a markdown copy at the same URL with a .md suffix. Engines that want the cleanest version can fetch the .md version directly, skipping the HTML rendering layer.
Who is using it
Adoption through 2025 and 2026 has been led by SaaS documentation sites, then by AI-native tools, then by content-heavy small businesses.
Major SaaS docs publishing llms.txt by mid-2026 include Anthropic's docs, Cloudflare's docs, Mintlify (which now generates llms.txt automatically for documentation sites it hosts), and a growing list of API-first companies.
AI tooling sites have adopted aggressively. Perplexity, OpenAI's developer docs, and most of the AI agent frameworks publish llms.txt. The mutual incentive is direct: AI engines build models partly from each other's docs, and llms.txt makes the ingestion cleaner.
For service businesses, adoption is still rare. Most of the open opportunity sits here. The cost of publishing one is low (one afternoon) and the upside is non-trivial as AI search continues to grow.
What the file actually contains
An llms.txt file is markdown. The structure is loose but the proposal recommends a specific shape.
A single H1 with the project or business name. A short blockquote summarizing what the project does. Optional context paragraphs. One or more H2 sections that group related links. Each link is a markdown bullet with a descriptive title and a one-line note about what the page covers.
The file is meant to be read top-to-bottom by an LLM during retrieval. Clear headings, descriptive titles, and brief context help the engine understand which pages to prioritize for which kinds of queries.
A typical service-business llms.txt would have an H1 with the business name, a short summary, then sections for Services, Guides, Case Studies, and About. Each section lists 5 to 15 links with brief descriptions. The whole file fits comfortably in 1,000 to 3,000 words.
llms.txt vs robots.txt vs sitemap.xml
The three files serve different functions and coexist.
robots.txt controls access. It tells crawlers which paths they may or may not request. It is enforcement-oriented; well-behaved crawlers honor it.
sitemap.xml is a complete list of pages on the site, machine-readable, used by search engines for crawl efficiency. It does not curate; it includes everything.
llms.txt is curation. It tells AI crawlers which pages you consider important and what each one covers. It does not enforce access (robots.txt does that), and it does not aim for completeness (sitemap.xml does that). It is a quality signal for AI retrieval.
A complete service-business setup in 2026 has all three: robots.txt for access control, sitemap.xml for crawl coverage, and llms.txt for AI-curated importance signals.
Does it actually work
Empirical evidence on llms.txt impact through 2026 is mixed but trending positive.
OpenAI has confirmed that ChatGPT's search retrieval reads llms.txt where present. Anthropic has not formally confirmed but Claude appears to weight it. Perplexity has acknowledged using it as a signal. Google has been quieter but its AI Overviews retrieval appears to use it as part of broader content quality scoring.
Direct measurement is hard because the retrieval logic is private. The closest signal is citation tracking: sites that publish llms.txt and track citations through tools like Profound or Otterly tend to see modest gains in citation share within 60 to 120 days, particularly for queries about their core business focus.
The honest version: llms.txt is one of many signals. It is not a silver bullet and a thin site with weak content will not rank because of it. For a site with strong factual content already, it is a small but real boost in retrieval prioritization, and the cost of adding it is low enough that the calculation is straightforward.
When to add it
Now, if your site is reasonably content-rich. The cost is one afternoon to draft and ship the file. The downside is zero. The upside is a small edge in AI retrieval that compounds over time.
Skip it if your site is fewer than 10 indexed pages, has no published guides or content sections, and operates only as a service-page brochure. There is not enough on the site to curate.
For everyone else, the calculation is straightforward. Read the setup guide, draft your file, validate the format, ship it, and check that AI engines can fetch it.
People also ask
Frequently asked
What is the purpose of llms.txt?
llms.txt is a markdown file at your domain root that gives AI crawlers (ChatGPT, Perplexity, Claude, Google AI Overviews) a curated map of your most important content. Where robots.txt controls access and sitemap.xml lists every page, llms.txt tells AI engines which pages you consider authoritative.
Who created llms.txt?
Jeremy Howard, co-founder of Answer.AI and author of the fast.ai courses, proposed the format in September 2024. The proposal lives at llmstxt.org. Through 2025 and 2026, adoption has spread first to SaaS documentation, then to AI-native sites, and is now reaching the small-business and service-business layer.
Where do I put llms.txt?
At the root of your domain, the same location as robots.txt and sitemap.xml. The full URL is yourdomain.com/llms.txt. Some sites also publish llms-full.txt with extended content and per-page .md alternatives at the same URL plus a .md suffix.
Does llms.txt help SEO?
Indirectly. llms.txt does not affect traditional Google rankings, which use sitemap.xml and the link graph. It affects retrieval in AI search engines (ChatGPT, Perplexity, Claude, AI Overviews). For sites that care about being cited in AI answers, it is a small but real positive signal.
Is llms.txt the same as robots.txt?
No. robots.txt controls which paths crawlers may access. llms.txt tells AI crawlers which pages you consider important and what each covers. They serve different functions and a complete setup includes both.
Do AI engines actually read llms.txt?
Yes, with caveats. OpenAI has confirmed that ChatGPT's search retrieval reads it where present. Anthropic and Perplexity weight it as a signal. Google's behavior is less explicit but its AI Overviews appear to use it as part of broader retrieval scoring. The signal is one of many; it is not a silver bullet.
Thinking about rebuilding?
15 minutes on a call. No pitch, no pressure. We’ll tell you honestly whether you need a new site and what it should do.
book a discovery call