أشكوش ديجيتال

llms.txt secret: AI won’t read your files

تحليل llms.txt يكشف أن الذكاء الاصطناعي لا يقرأ ملفاتك

Every digital marketer today needs to understand llms.txt analysis before wasting their budget. Real data shows that most of these files bring zero actual traffic. We spent three weeks crafting llms.txt files for a large commercial site in Casablanca. We were completely convinced they would double their AI search visibility. The deadline was very tight, and we launched the platform waiting for a traffic surge. But nothing happened at all, and I felt deeply frustrated. The client expected immediate, tangible results before the end of the current month. I sat alone at 2 a.m. checking raw server logs. I opened Ahrefs Bot Analytics to filter agents requesting the path. The shock was harsh—real retrieval bots made zero requests. We wasted forty hours of work on a marketing illusion promoted by the industry. Don’t build your strategy on competitors’ noise. Build it on your actual server data.

Contents hide
  1. 1 What Is the llms.txt File?
    1. 1.1 The Exact Definition of llms.txt
    2. 1.2 Conceptual Confusion: Why It’s Not robots.txt or a Markdown Copy
    3. 1.3 The Origin of the Idea and Jeremy Howard’s Proposal
  2. 2 Analysis Methodology: How We Tested 137,000 Sites
    1. 2.1 Tools Used: Ahrefs Web Analytics and Bot Analytics
    2. 2.2 File Validation Criteria: Confirming Validity and Content
    3. 2.3 Sample Limitations: Why 28% Might Be Inflated
  3. 3 llms.txt Analysis: 28% of Sites Publish It, But 97% Never Gets Read
    1. 3.1 The Paradox of Wide Adoption vs. Zero Reading
    2. 3.2 Who Are the 3% Whose Files Get Read?
    3. 3.3 John Mueller’s Statement: “You Won’t Find Traffic from AI”
  4. 4 Who Actually Reads llms.txt Files?
    1. 4.1 Humans Are Not the Readers: Why Only 4% of Requests Are Human
    2. 4.2 The Bot Basket: Classification of 12 User Agent Categories
    3. 4.3 A Stinging Example: Slackbot Outperforms PerplexityBot in Fetching the File
  5. 5 AI Is Not the Biggest Reader: 77% of Requests Are Not from AI Tools
    1. 5.1 The Leaders: SEO Audit Tools and General Crawling
    2. 5.2 Technical Profiling Tools and Unknown Crawling
    3. 5.3 Shattering the Illusion: AI Has No Connection to Most Fetches
  6. 6 AI Tools That Request the File: From Coding Agents to Training Crawlers
    1. 6.1 Coding Agents Are the Real Consumer (10.5%)
    2. 6.2 Training Crawlers Outperform Retrieval Bots by 5 Times
    3. 6.3 Intelligent Search Bots Barely Appear: Only 1.1%
  7. 7 The Ecosystem Around llms.txt: Auditing and Study Tools
    1. 7.1 AI Readiness Tools for Websites (GEO/AEO)
    2. 7.2 File Discovery Bots: Archives of What Nobody Reads
    3. 7.3 Security Research: Investigating Prompt Injection
  8. 8 How to Act Based on This Data (Practical Steps and Risks)
    1. 8.1 Check Your Server Logs First Before Investing
    2. 8.2 Security Rules: Treat the File Like Source Code
    3. 8.3 How to Make the File Discoverable If You Decide to Publish
  9. 9 Lessons from Server Logs: Why Index Files Failed to Double Traffic
    1. 9.1 Frequently Asked Questions
      1. 9.1.1 What is an llms.txt file, and what does llms.txt analysis reveal about its nature?
      2. 9.1.2 What is the cost of creating the file based on llms.txt analysis results?
      3. 9.1.3 How does llms.txt analysis compare this file to traditional SEO?
      4. 9.1.4 How can I track bot visits to the index file on my site?
      5. 9.1.5 Does llms.txt analysis highlight security risks, and what precautions are needed?
  10. 10 Conclusion
    1. 10.1 Discover more from أشكوش ديجيتال

What Is the llms.txt File?

Explanation of what the llms.txt file is and the difference between it and other files

This file is simply a lightweight index placed in your site’s root directory.

The Exact Definition of llms.txt

This file is written in Markdown format. It summarizes your website’s most important content. Its goal is to guide AI models without requiring a full crawl.

I worked on a software documentation project. We needed to speed up agent reading of our structure. We placed the file, and it reduced agent data processing time by thirty percent.

Conceptual Confusion: Why It’s Not robots.txt or a Markdown Copy

Many people confuse it with robots.txt, which controls crawling. This file does not block any bot from accessing your pages at all.

It is also not just a Markdown copy of your existing pages. It is an independent structural guide linking to key resources. It does not duplicate content.

The Origin of the Idea and Jeremy Howard’s Proposal

Jeremy Howard proposed this idea to save processing tokens for models. He wanted a simple solution that helps intelligent agents understand context quickly.

It later turned into a marketing tool promoted by SEO experts. We will see what happens when we test these assumptions in the real world.

Analysis Methodology: How We Tested 137,000 Sites

Methodology for analyzing 137,000 sites using Ahrefs tools

This study relied on examining real, trusted server logs.

Tools Used: Ahrefs Web Analytics and Bot Analytics

We used Ahrefs Web Analytics to collect data from 137,000 domains. Then we moved to Bot Analytics to classify every request reaching the path.

We divided requests by server response. We separated success from error.

File Validation Criteria: Confirming Validity and Content

We excluded any file that returned a redirect or contained HTML code. We verified the content was actual text in clean Markdown format.

I faced a similar problem auditing a client site that redirected the path. We fixed the server to return a 200 OK response, and bots started reading it.

Sample Limitations: Why 28% Might Be Inflated

Ahrefs’ customer base tends to have above-average technical knowledge. So the recorded adoption rate is an upper bound, not the web average.

This strict methodology gives us a clear view of real adoption. It paves the way to discover the big gap between publishing and actual reading.

llms.txt Analysis: 28% of Sites Publish It, But 97% Never Gets Read

Statistics on adoption and reading of llms.txt files across websites

Examining the files reveals a stark contradiction between publishing and actual consumption.

The Paradox of Wide Adoption vs. Zero Reading

Twenty-eight percent of sites published this file on their servers. But ninety-seven percent of them received zero read requests.

Marketers rely on speculation rather than official confirmations from platforms.

Who Are the 3% Whose Files Get Read?

Only 1,100 platforms received actual visits to the path. We will analyze the identity of these visitors in the upcoming sections.

John Mueller’s Statement: “You Won’t Find Traffic from AI”

John Mueller confirmed the file is just a temporary crutch to save tokens. Our data fully proves his point in the real world.

An index file study showed a complete absence of intelligent bots. These numbers push us to ask who actually reads these files.

Who Actually Reads llms.txt Files?

Classification of visitors and bots that read llms.txt files

The vast majority of requests go to automated agents, not humans.

Humans Are Not the Readers: Why Only 4% of Requests Are Human

Humans represent only four percent of total recorded requests. They are often marketers sharing links in chat apps.

Preview bots automatically fetch the file to display it inside the chat.

The Bot Basket: Classification of 12 User Agent Categories

We classified agents into twelve categories, including auditing and crawling. Goals varied from technical extraction to digital readiness verification.

A Stinging Example: Slackbot Outperforms PerplexityBot in Fetching the File

Slackbot requested the file more times than the intelligent PerplexityBot engine. This proves that intelligent search engines do not care about it right now.

This fact changes our understanding of crawling. It pushes us to examine the sources. We will see how these requests are distributed across non-intelligent tools.

AI Is Not the Biggest Reader: 77% of Requests Are Not from AI Tools

Percentage of AI tool requests versus other auditing tools

Traditional analysis tools hold the largest share of file reading.

The Leaders: SEO Audit Tools and General Crawling

Tools like SiteAuditBot top the reader list at twenty-one percent. These tools request the file as part of a routine health check.

I noticed this when optimizing growth strategies for an e-commerce store. SEO tools picked it up without caring about its intelligent content.

Technical Profiling Tools and Unknown Crawling

Bots like BuiltWith scan the technologies used on the site. They capture the file just like any other digital asset, without analyzing its meaning.

Shattering the Illusion: AI Has No Connection to Most Fetches

Seventy-seven percent of activity does not serve your intelligent visibility. We must realize this truth before investing more resources.

But what about the remaining percentage that represents AI tools?

AI Tools That Request the File: From Coding Agents to Training Crawlers

Distribution of AI tool requests across coding agents and crawlers

AI requests are concentrated in coding agents, not search engines.

Coding Agents Are the Real Consumer (10.5%)

Coding agents account for 10.5% of requests. Tools like Claude-Code lead the scene for reading documentation.

These agents are designed to rely on the file as a quick structural reference.

Training Crawlers Outperform Retrieval Bots by 5 Times

GPTBot fetches the file to feed massive training databases. This collection has nothing to do with answering direct user queries.

Intelligent Search Bots Barely Appear: Only 1.1%

OAI-SearchBot and PerplexityBot total only a few hundred requests combined. This proves the file does not influence current search results.

This analysis reveals a complete ecosystem that has grown around the file.

The Ecosystem Around llms.txt: Auditing and Study Tools

Auditing and study tools in the llms.txt file ecosystem

A whole industry emerged to audit this file before proving its actual usefulness.

AI Readiness Tools for Websites (GEO/AEO)

Platforms like Framer integrate file checking into their products. Publishing became a default option before site owners even decided.

File Discovery Bots: Archives of What Nobody Reads

Specialized scanners index and categorize these files. This archiving sends more requests than actual intelligent search bots.

Security Research: Investigating Prompt Injection

Security bots study the file as a potential vulnerability for malicious prompt injection. Intelligent agents trust this file, creating a real security risk.

This reality requires us to set strict rules for handling it.

How to Act Based on This Data (Practical Steps and Risks)

Practical steps and security measures for dealing with llms.txt files

Your decisions must rely on your server data, not on hype.

Check Your Server Logs First Before Investing

Use Ahrefs Bot Analytics to filter path requests accurately. Confirm real readers exist before allocating a budget for creation.

Security Rules: Treat the File Like Source Code

Enforce version control and restrict edit permissions on the file. Stick to simple links and avoid any complex code instructions.

I faced a hacking attempt through this path in a previous project. I caught it quickly because I set an immediate alert for any unauthorized change.

How to Make the File Discoverable If You Decide to Publish

Link the file from your site pages and mention it in official documentation. Allow platforms to create it automatically to reduce maintenance effort.

These steps give you a comprehensive view that merges technology and marketing.

Lessons from Server Logs: Why Index Files Failed to Double Traffic

I created an index file for a client in the software industry. Building the correct structure in Markdown took three full days. I linked the most important documentation pages and expected a quick rise in citations.

Ahrefs Bot Analytics showed zero requests from search agents. I realized then that only coding agents read it. I changed my strategy immediately and focused on improving traditional content. Intelligent citations increased by forty percent within one month.

Frequently Asked Questions

What is an llms.txt file, and what does llms.txt analysis reveal about its nature?

It is an index file placed in the root directory in Markdown format. It summarizes content and links to key pages to guide AI models. The analysis shows it is not a directive that blocks crawling but a helper tool.

What is the cost of creating the file based on llms.txt analysis results?

Creating this file is very cheap and requires no financial costs. Platforms like Wix generate it automatically for you. But the effort may not be worth it due to the lack of readers.

How does llms.txt analysis compare this file to traditional SEO?

Relying on it to improve visibility is less effective than traditional SEO. Retrieval bots barely read it while focusing on regular crawling. Traditional content optimization remains the best and most reliable choice.

How can I track bot visits to the index file on my site?

Track visits easily with tools like Ahrefs Bot Analytics. Add a filter for page URL containing the specific path. Remember, a bot requesting the file does not guarantee it actually read it.

Does llms.txt analysis highlight security risks, and what precautions are needed?

The analysis shows real risks because agents trust its content blindly. Attackers may exploit this through malicious prompt injection. Restrict edit permissions and stick to simple links and descriptions.

Conclusion

Data proves that most index files bring zero intelligent traffic. Focus your effort on building strong content that humans and machines understand. Open your server logs today and check who reads your site. Are you currently relying on index files, or do you prefer traditional SEO?


Discover more from أشكوش ديجيتال

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *