Prooflytics
SEO9 min read

llms.txt: How to Make Your Marketing Site Readable by AI Agents

Google added llms.txt to Lighthouse agentic audits in May 2026, making it a measurable signal for how well sites expose their content to AI crawlers. llms.txt is a plain-text file at the root of your domain that tells AI agents what content is available, how to use it, and what is off-limits -- a robots.txt for the agentic web. Here is what to put in it.

Abstract data and AI technology representing machine-readable web content

llms.txt: How to Make Your Marketing Site Readable by AI Agents

AI agents -- tools like ChatGPT Browsing, Perplexity, Claude, and automated research agents -- are increasingly crawling websites to answer user queries and complete tasks. Unlike traditional search crawlers, AI agents need to understand not just page content but the structure, purpose, and intended use of site content. The llms.txt protocol is an emerging standard that addresses this: a plain-text file at yourdomain.com/llms.txt that describes your site to AI agents in machine-readable form. In May 2026, Google added llms.txt to Lighthouse agentic audits, signaling that the protocol is moving from experimental to expected infrastructure for sites that want visibility in AI-mediated search.

Key takeaways

  1. llms.txt is a plain-text file placed at the root of your domain (yourdomain.com/llms.txt) that describes your site's content structure and key pages to AI language model agents.
  2. Google added llms.txt to Chrome Lighthouse agentic audits on May 5, 2026 -- this means it is now a measurable crawlability signal, similar to how robots.txt became a baseline SEO hygiene check.
  3. llms.txt differs from robots.txt: robots.txt controls which pages crawlers can access; llms.txt guides AI agents on how to understand and use the content they find.
  4. For marketing sites, llms.txt should prioritize: product pages, pricing pages, feature comparisons, case studies, and high-quality blog posts -- the content most likely to satisfy AI-mediated commercial queries.
  5. A well-structured llms.txt increases the probability that AI answer engines cite your content accurately rather than inferring from partial page reads or paraphrasing incorrectly.

What llms.txt is and why it matters now

llms.txt: a plain-text file located at the root path of a domain (yourdomain.com/llms.txt) that describes site content, key URLs, and content use guidelines in a format optimized for large language models and AI agents.

The protocol was proposed to address a specific problem: AI agents do not browse websites the way human users or even traditional crawlers do. A traditional crawler indexes every page it finds via links. An AI agent tasked with answering "what does Prooflytics do?" needs to find and read the most relevant pages quickly, understand the site's structure, and determine which content is authoritative. Without a guidance file, the agent guesses -- following links heuristically, sometimes reading outdated blog posts or low-authority pages rather than the canonical product description.

llms.txt provides a deliberate signal:

  • Here is what this site is about (brief description)
  • Here are the most important pages (links with descriptions)
  • Here is how the content may be used (citations, training data opt-in/out)

Why the Lighthouse addition matters: Google adding llms.txt to Lighthouse agentic audits is a meaningful signal. Lighthouse is used by SEO professionals, developers, and Google's own Core Web Vitals measurement infrastructure. When a metric appears in Lighthouse, it moves from "interesting experiment" to "measurable production standard." Sites that implement llms.txt will now show a pass; sites that do not will show a recommendation to add one.

How llms.txt differs from robots.txt and sitemap.xml

The three files serve different purposes:

FilePurposeAudience
robots.txtControls which pages crawlers can accessAll crawlers
sitemap.xmlLists all URLs for indexationSearch engine crawlers
llms.txtDescribes content structure and priority for AI agentsLLM agents and AI crawlers

robots.txt is a directive: it blocks or allows access. sitemap.xml is a complete inventory: every URL the site wants indexed. llms.txt is a guide: it does not control access but explains what the AI agent should pay attention to and how to understand the site's purpose.

A site can have all three without conflict. robots.txt blocks certain sections from all crawlers; sitemap.xml lists all public URLs; llms.txt highlights the 10-20 pages an AI agent should read first to understand the site's core value proposition and key content.

The marketing case for llms.txt

The operational problem this creates for marketing and SEO teams: AI answer engines (Google AI Overviews, ChatGPT, Perplexity) are increasingly the first point of contact for commercial queries. A user asking "what is the best marketing analytics platform for a 20-person team" is likely to see AI-generated answers before any organic SERP results. Those answers are built from content the AI agents crawled.

If the AI agent that crawled your site read a 2-year-old blog post instead of your current product page, the answer it generates about your product may be stale, inaccurate, or missing key differentiators. llms.txt allows you to direct the agent to the pages that represent your current product accurately.

For marketing sites specifically, the most valuable content to surface in llms.txt includes:

  • Product and feature pages (what the product actually does today)
  • Pricing page (the most commonly searched commercial intent query)
  • Comparison and alternative pages ("Prooflytics vs Supermetrics" type queries)
  • Case studies or social proof pages
  • A curated selection of recent blog posts on core topics
Prooflytics

Connect search to the rest of the picture

Every channel in one brief, so search isn't measured in a silo.

14 days free · no credit card

What to include in a llms.txt file

A functional llms.txt for a marketing SaaS site follows this structure:

# [Company Name]

[2-3 sentence description of what the company does, written for an AI agent reading it to understand the site's purpose]

## Key pages

- [Home page URL]: [One sentence on what this page covers]
- [Product page URL]: [Description of the core product]
- [Pricing page URL]: [Pricing and plan structure]
- [Features page URL]: [Key capabilities]
- [Compare page URL]: [Positioning vs alternatives]

## Blog: selected posts

- [URL]: [Title and topic]
- [URL]: [Title and topic]
[10-20 most authoritative or recent posts]

## Content policy

[Statement on whether content may be used for AI training, cited in answers, or has other use restrictions]

The description at the top is the most important element. It should be written as if you are briefing an AI agent that has never seen your site: company name, category, what the product does, who it is for. Use concrete terms, not marketing superlatives.

Example description for a marketing analytics SaaS:

# Prooflytics

Prooflytics is a marketing intelligence platform for in-house marketing teams
at B2B SaaS companies and e-commerce brands. It connects to 140+ data sources
including Meta Ads, Google Ads, GA4, LinkedIn Ads, HubSpot, and Stripe to
generate daily AI briefings, performance reports, and hypothesis-driven
marketing recommendations. The product includes campaign intelligence,
weekly performance reporting, HADI hypothesis tracking, and competitor
intelligence modules.

Implementation checklist

Step 1: Create the file

Create a plain text file named llms.txt and place it at the root of your domain. It must be accessible at https://yourdomain.com/llms.txt without authentication. Use UTF-8 encoding.

Step 2: Write the site description

Write 2-3 sentences describing the company, product category, target audience, and key capabilities. Avoid jargon, marketing superlatives, and vague claims. Write for an AI agent that needs to categorize and summarize your site accurately.

Step 3: List key pages with descriptions

Add the 10-20 most important pages. For each, include: the full URL and a one-sentence description of what the page covers. Prioritize pages that answer the queries your ICP is most likely to ask AI agents: product capabilities, pricing, comparisons, integration partners.

Step 4: Add a curated blog section

List 10-20 high-quality recent blog posts. Select posts that cover your core topics authoritatively -- not all posts, but the ones you would want an AI agent to cite if asked about your area of expertise.

Step 5: Define content policy

State clearly whether your content may be used for AI training data, cited in AI-generated answers, or has any use restrictions. Most marketing sites will allow citation for AI answers (it drives brand awareness) but may want to opt out of training data use. Check your legal team's guidance on this before publishing.

Step 6: Verify accessibility

Confirm the file returns HTTP 200 and is not blocked by robots.txt. Check using curl or a browser: curl -I https://yourdomain.com/llms.txt should return 200 OK. If blocked by auth or returning 404, fix before submitting to any AI crawler directory.

Step 7: Validate with Lighthouse

Run a Lighthouse audit via Chrome DevTools or the PageSpeed Insights tool. In the "Best Practices" or "SEO" category, check for any llms.txt-related audit results. Fix any issues flagged.

Bottom line

  • llms.txt is now a Lighthouse audit item (added May 2026) -- for AI-visible sites, it is moving from experimental to expected infrastructure.
  • The file guides AI agents to your most important pages; without it, agents browse heuristically and may cite outdated or low-authority content about your product.
  • For marketing SaaS sites, prioritize product, pricing, comparison, and case study pages in llms.txt -- these answer the commercial queries AI agents are most likely to process.
  • Write the site description as a briefing for an AI agent: concrete, factual, category-clear, jargon-free.
  • Verify the file returns HTTP 200 and is not blocked by robots.txt before publishing. Run Lighthouse to confirm the audit passes.
  • You can read independent reviews of Prooflytics on G2 and compare it to alternatives in the marketing analytics category.

Frequently asked questions

What is llms.txt?+

llms.txt is a plain-text file placed at the root of a website (yourdomain.com/llms.txt) that describes the site's content structure, key pages, and content use guidelines in a format readable by AI language model agents. It is analogous to robots.txt (which directs traditional search crawlers) but is designed for AI agents that need to understand the purpose and priorities of site content, not just which pages to crawl. Google added llms.txt to Chrome Lighthouse agentic audits in May 2026.

Is llms.txt required for SEO?+

Not required in the traditional sense -- there is no penalty for not having one. However, since Google added llms.txt to Lighthouse audits, sites without it will show a recommendation to add one, similar to how missing meta descriptions or broken sitemap links appear as recommendations. For sites that depend on AI-mediated search visibility (AI Overviews, Perplexity, ChatGPT citations), llms.txt is increasingly important hygiene infrastructure for controlling which pages AI agents read first.

How is llms.txt different from robots.txt?+

robots.txt controls access: it tells crawlers which pages they are and are not allowed to fetch. llms.txt is a guide, not a directive: it does not block or allow pages but tells AI agents which pages are most important and how to understand the site's purpose. A page can be accessible (not blocked by robots.txt) but not mentioned in llms.txt, in which case AI agents may still read it but will not know it is a priority. llms.txt supplements robots.txt rather than replacing it.

What pages should I include in llms.txt?+

For a marketing site, prioritize pages that answer commercial intent queries: product pages, pricing pages, feature and comparison pages, integration partner pages, and case studies. For content sites, include the 10-20 most authoritative blog posts on core topics. Do not list every page -- the purpose of llms.txt is to help AI agents identify the most important pages quickly, not to replicate the sitemap. Aim for 15-30 total entries covering the full product scope.

Does llms.txt affect AI Overview rankings?+

Google has not confirmed that llms.txt directly affects AI Overview rankings. However, by providing AI agents with accurate, current information about your site's key pages, you increase the probability that AI-generated answers about your site or category cite your content accurately rather than making inferences from less authoritative pages. The indirect effect -- more accurate AI citations leading to more qualified traffic -- is the primary motivation for implementation rather than a direct ranking mechanism.

Prooflytics

Connect search to the rest of the picture

Every channel in one brief, so search isn't measured in a silo.

14 days free · no credit card

Continue reading

SEO· 9 min read

Robots.txt vs Noindex: What Each Controls and When to Use Which

Robots.txt blocks crawlers from reading a page. Noindex prevents a page from appearing in search results. They are not interchangeable -- using the wrong one can cause pages to surface in SERPs despite your intent. This guide clarifies which control to use for SEO, AI crawlers, and staging environments.

Strategy· 9 min read

Reuters and Time Block AI Crawlers by Default: What the Allowlist Shift Means

Reuters and Time adopted allowlist-by-default AI crawler policies in May 2026, blocking all bots except a pre-approved set. People Inc. expanded its blocked user agents from approximately 2,100 to over 30,000 after the switch. A Tollbit report found 30% of total AI bot scrapes did not comply with explicit robots.txt permissions. Here is how the publisher AI blocking trend affects content strategy and AI visibility.

Strategy· 10 min read

CMO vs CIO: The $40B AI Agent Accountability Gap in Enterprise Marketing

AI agent activity increased 150% month-over-month from November 2025 to March 2026. 88% of search visits are now AI agents. A survey of 1,000 enterprise leaders found 75% lack a documented plan and 72% report marketing owns AI agent responsibility without ever being formally handed it. The $40B opportunity at stake requires resolving who owns what between the CMO and CIO.

Strategy· 8 min read

AI Visibility in 2026: Why Transactions Are the New Citations

Late June 2026, Chrome auto-browse rolls out to 200 million Android devices. AI visibility is no longer about whether your site gets cited in ChatGPT or Perplexity. It is about whether an agent can complete a booking or purchase on your site without human help. Eight specific failure modes will silently block those transactions.