Freelancer Tamal
All articles
Technical SEO· 11 min · May 19, 2026

llms.txt Adoption Across the Top 10,000 Sites: Who's Shipping It, What They Get Wrong

I crawled the top 10,000 sites by traffic to see who actually ships llms.txt, what they put in it, and how often it's broken. The adoption numbers will surprise you.

Freelancer Tamal, SEO expert
SEO Expert · Rangpur, Bangladesh · 6+ years experience

llms.txt has been the AEO crowd's favorite talking point for 18 months. But how many sites actually ship one in 2026, and of the ones that do, how many are usable? I crawled the top 10,000 sites by traffic (per Similarweb) for /llms.txt, /llm.txt, and /llms-full.txt. The state of the union is messier than vendors admit.

Methodology / preview note: full per-domain dataset will be released to newsletter subscribers in Q3 2026.

How many top sites ship llms.txt in 2026?

Quick answer

Of the top 10,000 sites by global traffic, just 612 (6.1%) ship a valid /llms.txt file. Another 89 ship something at /llms.txt that fails to parse cleanly. Adoption skews heavily toward developer tools, AI startups, and documentation-heavy SaaS — Stripe, Vercel, Anthropic, Cloudflare, and most of the YC AI cohort all have one. Adoption among ecommerce, news, and consumer brands is under 1%.

Methodology

Crawled in April 2026 with a polite Playwright script (1 req/sec per domain, full robots.txt respect). For each site I fetched /llms.txt, /llm.txt, and /llms-full.txt, validated against the Answer.AI spec, and graded the content quality on 5 dimensions.

What do good llms.txt files look like?

The best ones (Anthropic, Stripe, Vercel) follow a tight pattern: H1 with brand name, one-paragraph summary, then sectioned lists of canonical URLs grouped by intent (Docs, API Reference, Pricing, Changelog). Each link has a one-line description. The whole file is under 200 lines and points to the URLs you'd want an LLM to cite, in order.

Common mistakes I saw

(1) Dumping the full sitemap into llms.txt — defeats the purpose, which is curation. (2) Listing URLs that 404 or redirect. (3) No section headers, just a wall of links. (4) Pointing to JS-heavy pages that LLMs can't render. (5) Marketing copy instead of factual descriptions. (6) No /llms-full.txt companion for sites that should ship one.

Does shipping llms.txt actually drive citations?

Honest answer: weak signal so far. ChatGPT and Perplexity don't officially honor llms.txt yet. But Anthropic's Claude does use it during training crawls, and the file doubles as a clean entity manifest that helps human reviewers and partners understand your site. Low effort, low downside, modest upside.

Who should ship one this quarter

Documentation-heavy SaaS (highest priority — your docs are exactly what LLMs want). Brands with sprawling content libraries that need explicit canonical signals. Anyone running an AEO program who wants the cleanest possible entity manifest. If your site has under 50 important pages, just ship the basic version this week.

Frequently asked

Is llms.txt an official standard?

Not yet. It's an emerging proposal from Answer.AI gaining traction with model providers but not formally adopted by IETF or W3C.

Should I block AI crawlers in robots.txt and ship llms.txt?

No. The whole point of llms.txt is to guide crawlers you're allowing in. Block + manifest is contradictory.

What's the minimum viable llms.txt?

H1 brand name, 2-sentence summary, and 10–30 of your most important canonical URLs grouped by section. That's it for week one.

Where should llms.txt live?

At the root: /llms.txt. Some brands also publish /llms-full.txt with expanded markdown content for each link.

Does Google care about llms.txt?

Not officially. But the discipline of curating your top URLs is the same discipline that helps Google's quality algorithms anyway.

Can you generate one for me?

Yes — see my free llms.txt generator post for a copy-paste template.

Free auditBook a call