Optimizing Your Content for AI Search and Agents: Lessons from Building Andi
TL;DR: To optimize for AI search and agents: 1) Make content accessible as clean HTML/markdown with good structure. 2) Allow AI crawlers in robots.txt and firewall rules. 3) Return content fast, with key info high up. 4) Use semantic markup, metadata, and schemas. 5) Create an llms.txt file. Test with andisearch.com and firecrawl.dev to see how AI sees your site.



Jan 28, 2025
TL;DR: To optimize for AI search and agents: 1) Make content accessible as clean HTML/markdown with good structure. 2) Allow AI crawlers in robots.txt and firewall rules. 3) Return content fast, with key info high up. 4) Use semantic markup, metadata, and schemas. 5) Create an llms.txt file. Test with andisearch.com and firecrawl.dev to see how AI sees your site.
I've seen a few posts recently from founders asking how to optimize their websites for AI search and agents, versus traditional SEO. As we've learned building Andi, an AI search engine, there are some key differences in how to approach this. I wanted to share what we've learned to help other YC companies make their content more accessible to AI.
At Andi, we face this challenge from the other side - trying to quickly find and meaningfully access the best content to use for AI question answering, summarization, search and agents. We're ingesting 30-50 million pages a day, and constantly run into issues accessing and extracting useful content. Here's what we've learned about making your content AI-friendly:
The Big Differences from Traditional SEO
Speed and simplicity are critical. Many AI systems have tight timeouts (1-5 seconds) for retrieving content. Assume long content may be truncated or dropped completely after the timeout.
Clean, structured text content is king. Many AI crawlers don't handle JavaScript well, if at all. Logical content structure in plain HTML or markdown is ideal.
Metadata and semantic markup are even more important. Clear titles, descriptions, dates, and schema.org markup help AI systems quickly understand your content.
Blocking crawlers can make you invisible. In a world of AI agents, overly aggressive bot protection can cut you off entirely.
There's a big difference between allowing access for training data collection versus AI search results/agent use. You may want different policies for each.
Checking Your Content's AI Visibility
To quickly test how visible your content is:
AI search engine test: Paste a URL into andisearch.com to see if it's accessible and has enough "meat" to be useful for AI answers. If you see options for Summarize, Explain etc. then the page is accessible and considered useful.
AI agent test: Use the awesome Playground from our friends at Firecrawl (S22) to see what content an AI agent can access and how it's perceived.
Key Optimizations
Add a robots.txt with fairly open access. Allow or disallow crawlers on a case-by-case basis. Here's an example that allows access for AI search/agents but disallows training data collection:
# Allow AI search and agent useUser-agent: OAI-SearchBotUser-agent: ChatGPT-User User-agent: PerplexityBotUser-agent: FirecrawlAgentUser-agent: AndiBotUser-agent: ExaBotUser-agent: PhindBotUser-agent: YouBotAllow: /# Disallow AI training data collection User-agent: GPTBotUser-agent: CCBotUser-agent: Google-ExtendedDisallow: /# Allow traditional search indexingUser-agent: GooglebotUser-agent: BingbotAllow: /# Disallow access to admin areas for all botsUser-agent: *Disallow: /admin/Disallow: /internal/Sitemap: https://www.example.com/sitemap.xml
Don't use aggressive bot protection on Cloudflare/AWS WAF. This stops AI crawlers and agents from accessing your content. Allow major US datacenter IP ranges.
Optimize for speed. Return content as fast as possible, ideally under 1 second. Keep key content high up in the HTML.
Use clear metadata and semantic markup:
Include <title>, <meta description>, and <meta keywords>
Add OpenGraph tags for better content previews
Use schema.org markup with JSON-LD
Use proper heading structure (h1-h6) and semantic elements like <article>, <section>, <nav>
Keep content on a single page where possible. Avoid "read more" buttons or multi-page articles.
Provide programmatic access via APIs (with OpenAPI specs) or RSS feeds. This allows faster, more structured access for AI tools.
Clearly indicate content freshness with visible dates and <meta> tags.
For documentation or reference content, create an llms.txt file. Use Firecrawl's generator - it's an awesome handy tool to create one quickly.
Use sitemap.xml to guide crawlers to important content.
Include a simple favicon.ico and clear lead images for better visual representation in AI search.
Major AI Crawler User-Agents
Here's a list of major AI crawler user-agents to consider in your robots.txt:
OpenAI:
GPTBot (training data)
ChatGPT-User (user actions in ChatGPT)
OAI-SearchBot (AI search results)
Google:
Google-Extended (AI training)
GoogleOther (various AI uses)
Anthropic: ClaudeBot (consolidated bot for various uses)
Andi: AndiBot
Perplexity: PerplexityBot
You.com: YouBot
Phind: PhindBot
Exa: ExaBot
Firecrawl: FirecrawlAgent
Common Crawl: CCBot (used by many AI companies for training data)
For a full, up-to-date list, check Dark Visitors.
Optimizing for AI Agent Computer Use
AI agents that can use computers, like the super cool Browser Use (W25) or OpenAI's Operator, are a new frontier. Some tips:
Implement "agent-responsive design" - structure your site so AI can easily interpret and interact with it.
Ensure interactive elements like buttons and text fields are clearly defined and accessible.
Use consistent navigation patterns to help AI predict and understand site flow.
Minimize unnecessary interactions like login prompts or pop-ups that can disrupt AI task completion.
Incorporate web accessibility features like ARIA labels, which also help AI understand page elements.
Regularly test your site with AI agents and iterate based on the results.
Some Interesting Stats
34% of AI crawler requests result in 404 or other errors (Vercel analysis)
Only Google's Gemini and AppleBot currently render JavaScript among major AI crawlers
AI crawlers show 47x inefficiency compared to traditional crawlers like Googlebot
AI crawlers represent about 28% of Googlebot's volume in recent traffic analysis
Resources for Dev Tools Startups
If you're building developer tools:
Maintain an up-to-date llms.txt file.
Provide easy access to clean HTML or markdown versions of your docs.
Consider using documentation tools like those from our @Theneo or @Mintlify to optimize for AI accessibility.
Conclusion
Things are moving fast with AI search and agents. By following these practices, you can help make your content visible and useful for the AI age. Remember, it's a balance - you want to be accessible to helpful AI tools while still protecting against bad actors. Start by using andisearch.com and firecrawl.dev to see how your content appears to AI, then iterate from there.
For more detailed information, check out these resources:
The old world of blocking all bots is gone. You want AI agents and crawlers to see your content and navigate your sites. Good luck optimizing for our new AI overlords!