We crawled 1,460 of the world's top websites and checked every one for AI Discovery Files. Only 95 had any. Zero scored full marks. The average AI readiness score across all sites was 2.2 out of 5, which translates to "Passive" on our five-tier scale.
That means the typical top-ranking website has made no active preparation for AI-driven discovery. None at all.
The data comes from our Q1 2026 AI Discovery File Adoption Research, published yesterday. It's the first large-scale study to check not just whether these files exist, but whether they actually work. The results tell a clear story: the gap between how businesses think about SEO and how they think about AI visibility is enormous. Most haven't started thinking about it at all.
The Headline Numbers
The study examined the top 1,000 global and top 1,000 UK domains from the Tranco List, a research-grade ranking that merges multiple popularity lists. After filtering for accessibility, 1,460 domains were crawled for all ten types of AI Discovery File, from llms.txt and ai.txt to identity.json and brand.txt.
93.5% of those websites had nothing. No llms.txt. No ai.json. No identity file of any kind. When an AI system visits these sites, it gets raw HTML and whatever it can scrape from the page. No structured guidance about who the business is, what it does, or how it should be represented.
The 6.5% with at least one file spread across five readiness tiers:
| Tier | Status | Percentage | Domains |
|---|---|---|---|
| Tier 5 | AI-Optimised | 0% | 0 |
| Tier 4 | AI-Ready | 1.5% | 22 |
| Tier 3 | Partially Ready | 19.5% | 284 |
| Tier 2 | Passive | 76.5% | 1,117 |
| Tier 1 | Actively Blocking | 1.3% | 19 |
That zero at the top matters. Not a single website out of 1,460 achieved full AI visibility. The peak is empty. The first business to get there would be in a category of one.
Where AI Discovery Files Sit on the Adoption Curve
These numbers hit differently with context. Look at where AI Discovery Files sit compared to other web standards that businesses use (or ignore) today:
| Standard | Adoption Rate | Age |
|---|---|---|
| robots.txt | 45.3% | Created 1994 |
| Schema.org | 25.6% | Launched 2011 |
| ads.txt | 15.3% | Introduced 2017 |
| security.txt | 12.7% | RFC published 2022 |
| AI Discovery Files | 6.5% | Specs published 2024 |
| humans.txt | 2.5% | Proposed 2011 |
AI Discovery Files have already overtaken humans.txt and are closing on security.txt, despite being the newest entry on this list. That's not slow adoption. It's faster than most web standards at the same stage of their lifecycle.
But here's the part that matters for businesses: robots.txt took roughly fifteen years to reach 25% adoption. Schema.org took about eight. The businesses that adopted those standards early got durable advantages that late adopters spent years trying to close. We covered how AI Overviews are already stealing 58% of website clicks. Structured AI visibility is following the same adoption curve, just compressed. The window to lead is measured in months, not years.
Most Files That Exist Don't Work
Having a file and having it work are two different things. The study validated every file against the published specifications, and the quality gap is brutal.
llms.txt is the most widely adopted AI Discovery File. 3.8% of sites (55 domains) serve one. But only 2.4% (35 domains) had files that actually met the specification. That means a third of sites that bothered to create an llms.txt got it wrong.
llms.html told an even worse story: 41 sites served the file, but only 7 were valid. Just 1 was complete.
The overall quality breakdown across all AI Discovery Files found:
- Complete: 36%
- Minimal: 8%
- Invalid: 56%
More than half the AI Discovery Files on the web's top sites fail validation entirely. We saw this pattern coming. Our analysis of two earlier studies (OtterlyAI and SE Ranking) showed that llms.txt had zero measurable impact on AI visibility, but neither study checked the quality of the files they tested. They counted file existence and stopped there. When most files are auto-generated URL dumps or placeholder content, of course they don't work.
This study took a different approach. Every file was validated against its published specification and classified as Complete, Minimal, Invalid, or Not Found. The crawler also detects soft 404s, where a server returns HTTP 200 with a custom error page instead of the actual file. Other studies would count those as "present." Ours doesn't. The full validation methodology is published openly, so anyone can scrutinise or reproduce the approach.
Jeremy Howard, who created the llms.txt specification at Answer.AI, framed the original need clearly:
"Today websites are not just used to provide information to people, but they are also used to provide information to large language models."
Jeremy Howard, Answer.AI
He was right about the need. The data now shows that most sites attempting to meet that need are doing it badly.
The Robots.txt Blind Spot
87.5% of the web's top sites have no AI policy in their robots.txt file at all. They haven't blocked AI crawlers. They haven't explicitly allowed them. They just haven't thought about it.
Of the 12.5% that have made a decision:
- 10.3% selectively block some AI crawlers while allowing others
- 1.3% block all AI crawlers entirely
- 0.8% explicitly allow AI access
- 0.1% rate-limit AI crawlers
The selective blocking is telling. CCBot (Common Crawl) is blocked by 9.6% of sites. ClaudeBot (Anthropic) by 8.6%. GPTBot (OpenAI) by 8.5%. Bytespider (ByteDance) by 8.1%. Most of these blocks were probably added reactively, one crawler at a time, as each AI company launched its bot. It's not a strategy. It's whack-a-mole.
The businesses that haven't touched robots.txt at all are in an odd position. They're passively open to every AI crawler on the planet, but providing no structured information about themselves in return. They're giving away access without getting representation. If you haven't set up robots.txt, you're behind half the web. If you haven't thought about AI visibility, you're behind the curve, but it's still early enough to lead.
Who's Already Moving
The 22 domains that reached tier 4 (AI-Ready) include names you'd recognise.
Shopify (rank 164), Stripe (rank 261), and Opera (rank 87) are global tech leaders who consistently adopt web standards before everyone else. UK brands are represented too: Reed.co.uk (rank 291), ScotRail (rank 957), and English Heritage (rank 227) all scored AI-Ready. So did developer platforms like SourceForge (rank 214) and Mailchimp (rank 694), plus enterprise tools including Dynatrace, OneTrust, Qualtrics, and Netgear.
The pattern is consistent with every previous web standard adoption cycle. First movers are a mix of tech-forward companies and brands with strong digital teams. They see the direction of travel and act before it becomes urgent.
We covered why AI Discovery Files help businesses get recommended by AI systems. The research now proves that the brands most serious about their digital presence are the ones implementing them. If Shopify and Stripe are preparing for AI visibility, what's your business waiting for?
What This Means for Your Business
The headline number is 93.5%. But the actionable number is 1.5%.
Only 22 of the world's top 1,460 websites are AI-Ready. Getting your business to tier 4 puts you ahead of 98.5% of the top sites on the internet. That's not a marginal advantage. It's a structural one, available to anyone willing to act while competitors sleep.
We ran a similar test on 100 UK small businesses earlier this year. Three out of a hundred had any AI Discovery Files. The pattern at the top of the web mirrors what's happening at every level: almost nobody is doing this yet.
Here's what you can do today:
- Check where you stand. Run your site through the AI Visibility Checker. It reads your files, scores your implementation, and shows a live ChatGPT snapshot of how AI talks about your business right now.
- Read the specifications. The full AI Discovery File specifications are published openly. Start with llms.txt and identity.json. These two files give AI systems the most useful structured data about your business. If you're running WordPress, the AI Discovery Files plugin generates all 10 files automatically.
- Get verified. Submit your site to the AI Discovery Files Directory for a public, verified listing with a dofollow backlink.
- Set an AI policy in robots.txt. Even if you decide to allow all AI crawlers, make that an active choice rather than a passive default.
Our AI Visibility service handles the full implementation for businesses that want it done properly. But the research data is free, the specifications are open, and the checker costs nothing to run. The window is open. The data proves it's early. And the brands you compete with are already moving.
See the full interactive research with charts, breakdowns, and downloadable data (CC BY 4.0) at ai-visibility.org.uk.
Frequently Asked Questions
How many websites were included in this study?
1,460 domains from the Tranco List, drawn from the top 1,000 global and top 1,000 UK websites. The Tranco List is a research-grade ranking that combines multiple popularity sources, so the sample avoids the biases of any single ranking methodology. After filtering inaccessible domains, 1,460 were successfully crawled and analysed.
What is an AI Discovery File?
An AI Discovery File is a structured document placed on your website that tells AI systems who your business is, what you do, and how you want to be represented. There are ten types, including llms.txt (core identity for language models), ai.txt (AI usage permissions), identity.json (canonical business data), and brand.txt (brand representation rules). The full specifications are published at ai-visibility.org.uk.
What does AI-Ready (tier 4) mean?
A site scores AI-Ready when it has at least one valid AI Discovery File plus an active AI crawler policy in robots.txt, demonstrating a conscious strategy for AI engagement. Only 22 of the 1,460 sites studied reached this level. Tier 5 (AI-Optimised) would require multiple valid, complete files across several categories. No site achieved that.
Which brands are already AI-Ready?
Shopify, Stripe, Opera, Reed.co.uk, ScotRail, English Heritage, SourceForge, Mailchimp, Optimizely, Dynatrace, OneTrust, Qualtrics, and Netgear are among the 22 sites that scored AI-Ready. They span global tech, UK public services, and enterprise SaaS, but they share a pattern: strong digital teams and early adoption of web standards.
Why do 87.5% of websites have no AI policy in robots.txt?
Most website owners don't know AI crawlers exist. Unlike Googlebot (which everyone has heard of), bots like ClaudeBot, GPTBot, and Bytespider operate with low visibility. Many businesses haven't updated their robots.txt file in years, and AI-specific directives aren't part of standard SEO advice yet. The result is passive openness: AI can crawl freely, but gets no guidance in return.
Is AI Discovery File adoption growing?
Yes. At 6.5%, AI Discovery Files have already surpassed humans.txt (2.5%, proposed 2011) and are approaching security.txt (12.7%, RFC published 2022). Major platforms like Yoast, Wix, and Cloudflare have built native AI Discovery File support into their products. This is a Q1 2026 baseline; follow-up studies will track the trajectory.
How do I check my own website's AI readiness?
Use the AI Visibility Checker. It scans your site for all ten AI Discovery File types, validates each one against the specification, checks your robots.txt AI policy, and runs a live ChatGPT snapshot showing exactly what AI says about your business. It's free and takes under a minute.
What's the difference between having an llms.txt and having a valid one?
3.8% of sites in the study served an llms.txt file, but only 2.4% passed validation against the specification. Invalid files typically contain nothing but URL lists, placeholder text, or formatting that AI systems can't parse reliably. A valid file follows the specification structure: a clear title, description of the business, and organised sections that a language model can use as reference material.
Where Does Your Website Stand?
We crawled nearly 2,000 of the world's top websites. The AI Visibility Checker runs the same analysis on yours, free, in under a minute.
Check Your AI VisibilitySources
- AI Discovery File Adoption Research Q1 2026 - AI Visibility
- AI Discovery File Specifications - AI Visibility
- Research Methodology: AI Discovery File Adoption Study - AI Visibility
- llms.txt Specification - AI Visibility
- llms.txt: A Proposal to Provide Information to Help LLMs Use Websites - Answer.AI
- Tranco: A Research-Oriented Top Sites Ranking - Tranco List