How AI search engines find your business

When someone searches on Google, a bot crawls your page, reads the text, and adds it to an index. You optimize for that: keywords, backlinks, meta tags. You know this game.

AI search engines β€” ChatGPT, Claude, Perplexity β€” work differently. They send their own crawlers (GPTBot, ClaudeBot, PerplexityBot) to collect and train on web content. When a user asks "recommend a marketing agency in Munich," the model answers from what it has already read β€” not from a live search.

If your site was never readable to those crawlers, you simply do not exist in that answer. No matter how good your Google ranking is.

I audit websites for AI visibility. The pattern I see is consistent: businesses that rank well on Google often score close to zero on AI readiness. Not because the content is bad β€” but because the technical setup makes the content invisible.

The most common reason: your site runs on JavaScript

React, Vue, Angular, Next.js without SSR β€” all of these render your page content inside the browser. Google can do that. GPTBot, ClaudeBot and PerplexityBot cannot. They send one HTTP request, receive an empty HTML shell with a <div id="root"></div>, and move on. A Vercel and MERJ analysis of 500M+ GPTBot requests found zero JavaScript execution. SEODiff's study of 1M domains found a 97% "ghost ratio" for pure client-side rendered sites.

What AI crawlers actually see on your site

Here is a concrete example. A business runs a React-based website. Google visits, renders the JavaScript, reads the full page β€” product descriptions, service pages, team info. The site ranks well.

GPTBot visits the same site. It receives this:

<html>
  <head>...</head>
  <body>
    <div id="root"></div>
    <script src="/static/js/main.abc123.js"></script>
  </body>
</html>

Zero content. Zero context. The crawler records an empty page and moves on. Your business is not in the training data, not in the AI's memory of your industry, and not in the answers it generates.

JavaScript-rendered sites are the most common case β€” but not the only one. Even sites with server-side content can have other problems: robots.txt that blocks AI bots, no sitemap, no llms.txt, and content buried in inaccessible DOM structures.

What AI crawlers actually check on your site

01
Server-Side Rendering (SSR)
Is your content present in the raw HTML before any JavaScript runs? Sites using SSR (Next.js with SSR/SSG, Nuxt, WordPress, plain HTML) pass this check. Pure React/Vue/Angular apps without SSR fail.
Fix: migrate to Next.js SSR/SSG or Nuxt SSR. Or add a static prerender layer.
02
AI Bot Access in robots.txt
robots.txt controls which crawlers can access your site. Many sites block all bots with User-agent: * Disallow: /. This catches GPTBot, ClaudeBot and PerplexityBot along with spam crawlers. Others specifically block AI bots by name β€” sometimes set by a developer who did not realize this also blocked AI search indexing.
Fix: check /robots.txt and make sure GPTBot, ClaudeBot, PerplexityBot, Google-Extended are allowed.
03
llms.txt β€” the AI context file
llms.txt is a plain text file at /llms.txt that tells AI systems what your business does, which pages are most important, and how to represent you in answers. Think of it as a structured briefing for AI models. Sites with llms.txt are more likely to be cited accurately when users ask AI about your industry or service area.
Fix: create /llms.txt with your company description, key services, location, and links to important pages.
04
XML Sitemap
A sitemap tells crawlers which pages exist and when they were last updated. Without it, AI crawlers may miss important pages entirely β€” especially on larger sites or sites with complex navigation. Sitemaps are standard for Google SEO but often misconfigured or missing entirely on smaller business sites.
Fix: generate sitemap.xml and reference it in robots.txt with Sitemap: https://yourdomain.com/sitemap.xml
05
Readable Content Volume
AI crawlers need enough text to understand what your business does and what value it provides. Pages with only images, minimal text, or content hidden behind modals and tabs are difficult to process. The content needs to be in the HTML β€” not rendered by JavaScript, not inside SVGs, not inside canvas elements.
Fix: ensure every key service page has at least 300-500 words of plain, readable HTML text that describes what you do, who you serve, and where you operate.

Who is trying to read your site

  • GPTBot β€” OpenAI (ChatGPT training data)
  • ClaudeBot β€” Anthropic (Claude training data)
  • PerplexityBot β€” Perplexity AI (live search)
  • Google-Extended β€” Google (Gemini, AI Overviews)
  • CCBot β€” Common Crawl (used by many AI models)
  • anthropic-ai β€” Anthropic alternate crawler
  • Bytespider β€” ByteDance AI
  • Applebot-Extended β€” Apple AI features

If your robots.txt blocks any of these β€” even accidentally β€” that AI system has no access to your content and will not cite your business.

What a typical business site scores

A marketing agency in Germany. React-based site, good Google ranking, active blog. AI readiness audit result:

  • Agent Readable Content: 0/20 (JS-rendered, empty HTML)
  • Server-Side Rendering: 0/10 (CSR detected)
  • AI Bot Access: 8/15 (3 bots blocked in robots.txt)
  • llms.txt: 0/15 (missing)
  • Sitemap: 10/10 (present, correctly referenced)
  • Performance: 7/10 (400ms TTFB)

Total: 25/100. Critical. The client had no idea their site was invisible to AI search. Their Google traffic was fine β€” but AI-driven referrals were zero.

The cost of AI invisibility

Search behavior is shifting. A growing share of commercial queries now go through AI interfaces: ChatGPT, Perplexity, Google AI Overviews, Claude. Users ask "find me a freelance designer in Berlin" or "best accounting software for German SMBs" β€” and get a direct answer with named providers.

If your site is invisible to AI crawlers, you are not in those answers. Your competitors β€” the ones whose sites happen to be server-rendered, whose robots.txt happens to allow AI bots, whose llms.txt happens to exist β€” are being recommended instead.

This is not a future problem. It is happening now. And it compounds: the longer AI models go without seeing your content, the more thoroughly your competitors' positioning gets embedded in the models' understanding of your industry.

Generative Engine Optimization (GEO) is the practice of fixing this. It is not the same as classic SEO β€” though many of the fixes improve both. GEO focuses specifically on making your content readable and citable by AI systems.

Check your site's AI visibility right now

Free 8-point audit: AI crawler access, SSR detection, llms.txt, sitemap, content readability, token economics and more. Takes 10 seconds.

Run free AI readiness audit β†’

Generative Engine Optimization checklist

If your site uses React, Vue or Angular without SSR

This is the highest-priority fix. Migrate to server-side rendering. For React: Next.js with SSR or SSG (Static Site Generation). For Vue: Nuxt with SSR. For Angular: Angular Universal. If a full migration is not feasible immediately, add a prerendering layer (Prerender.io, Netlify Edge Functions) that serves static HTML to crawlers.

Check and fix robots.txt

Open yourdomain.com/robots.txt. Look for User-agent: * with Disallow: / β€” this blocks everything including AI bots. Look for explicit blocks: User-agent: GPTBot with Disallow: /. Remove any rules that block legitimate AI crawlers. If you want to allow all bots: User-agent: * followed by Allow: /.

Create an llms.txt file

Create a plain text file at /llms.txt. Include: your company name and one-sentence description, what services you offer, who you serve, where you operate (city, region, country), and links to your 5-10 most important pages. No special format required β€” plain readable text works. Some sites use Markdown formatting. The goal is to give AI systems a clear, concise briefing about your business.

Add or fix your XML sitemap

Most CMS platforms (WordPress, Webflow, Squarespace) generate sitemaps automatically. Check that yours exists at /sitemap.xml and is referenced in robots.txt. If you built a custom site, generate a sitemap with a tool like xml-sitemaps.com and host it at the root domain.

Expand readable content

Every key page (services, about, contact, location pages) should have substantial readable text in HTML β€” not just images and headers. Describe what you do, for whom, and why. This is good for classic SEO too β€” so there is no conflict with your existing optimization work.

Common questions about AI visibility

Why is my website invisible to ChatGPT if it ranks well on Google?
Google can execute JavaScript when it crawls your site β€” GPTBot and ClaudeBot cannot. If your site uses client-side rendering (React, Vue, Angular without SSR), AI crawlers see an empty page even when Google sees full content. Additionally, your robots.txt may block AI crawlers that Google ignores, and AI systems have different indexing priorities than Google's ranking algorithm.
What is llms.txt and do I need it?
llms.txt is a text file placed at yourdomain.com/llms.txt that tells AI models what your site is about, what pages matter, and how to represent your business in AI-generated answers. It is not mandatory β€” but sites with llms.txt are more likely to be cited accurately when users ask AI about your service category. It takes about 30 minutes to create and costs nothing to host.
What is Generative Engine Optimization (GEO)?
Generative Engine Optimization (GEO) is the practice of making your website readable and citable by AI-powered search engines like ChatGPT, Perplexity, and Google AI Overviews. Unlike classic SEO which targets Google's ranking algorithm, GEO focuses on: making content accessible to AI crawlers that cannot execute JavaScript, providing structured context via llms.txt, and ensuring your site appears in AI-generated answers when users ask questions in your category.
Does Google SEO still matter if AI search is growing?
Yes β€” Google still drives the majority of search traffic. But the share of queries answered directly by AI is growing fast. Businesses that optimize only for Google will increasingly miss users who search through AI interfaces. The good news: the fixes for AI visibility (SSR, structured content, llms.txt, sitemap) also improve classic SEO performance. There is no trade-off.
Need help fixing your site's AI visibility?

I audit and fix AI readiness issues for business websites: SSR setup, llms.txt, robots.txt, sitemap, content structure. Freelance AI SEO consultant based in Munich, working with German and international clients.

← Back to articles