01 Mar 2026

GEO: Making a Site Citable by AI

llms.txt, JSON-LD, crawler directives, and extractable structure — applied to this site as it was being built.

Traditional SEO convinces a ranking algorithm your page deserves to be at the top of a list of links. Generative Engine Optimization convinces an LLM that your content is the most authoritative ingredient for the answer it is assembling. SEO is about ranking. GEO is about becoming the source.

AI-referred sessions jumped over 500% between January and May 2025. A growing share of search queries now flow through ChatGPT, Perplexity, Google AI Overviews, and Claude. The ten-blue-links model is being supplemented — and in many query types, displaced — by synthesized answers with citations. A Princeton paper accepted at KDD 2024 formalised GEO as a framework to improve content visibility within generative engine responses.

I implemented GEO on this site the same day I built it. What follows is the playbook and the exact changes.

The Playbook

Structure for extractability, not just readability

LLMs use RAG systems that break documents into chunks and evaluate each chunk independently. Your content structure determines whether AI finds and cites you. Front-load answers in the first 40–80 words after each heading. Structure H2s as actual questions that mirror real user queries. Keep one idea per header section so chunks are clean. Use simple declarative syntax. Data tables over prose for comparative information. The goal: make it effortless for an AI system to lift a clean, citable snippet from your page.

The `llms.txt` standard

A Markdown file at your site’s root — like robots.txt — that tells AI language models what your site is about, which pages are most important, and how to describe your brand. Proposed by Jeremy Howard of Answer.AI in late 2024. The rationale: traditional crawlers index every page. An LLM needs to understand a site holistically — what the brand does, its key offerings, its most authoritative content. Without llms.txt, models must infer all of this from noisy HTML.

Schema markup and structured data

Schema.org markup in JSON-LD has moved from “nice to have” for rich snippets to a critical signal for AI systems. Entity-level structured data — defining who you are, what you do, your relationships to other entities in knowledge graphs — helps LLMs build confident, attributable claims. Key schemas: Person, Organization, Article, TechArticle, FAQ, HowTo.

Authority and trust signals

GEO amplifies E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) because LLMs specifically look for signals that a source is credible enough to cite. Named expert authorship with verifiable credentials. Original proprietary data — information gain is a major differentiator. Content that adds something new to the corpus gets cited far more than content that restates what already exists.

Semantic richness over keyword density

Keyword stuffing is actively counterproductive. LLMs understand meaning, not keyword frequency. Write with entity clarity — explicitly name the things you are discussing, define relationships between concepts, use precise terminology, include relevant statistics with sources. Content should be comprehensive enough that an LLM can construct a confident answer from it without triangulating across other sources.

Technical foundations

Server-side rendering — JS-heavy SPAs are harder for AI crawlers to parse. Granular crawler access in robots.txt for AI-specific bots (GPTBot, ClaudeBot, PerplexityBot). Clean URL structures and fast load times. If AI crawlers cannot access your pages, no amount of GEO helps.

GEO does not replace SEO. It layers on top of it. Crawlability, indexation hygiene, and site architecture remain the foundation. The relationship is additive.

The Implementation

What I built on this site

Every practice above was applied to bharatagarwal.dev on the same day the site went live. Here is every file, quoted from the live site.

`llms.txt`

A Markdown file at the root that gives an LLM a holistic understanding of the site without parsing HTML, JavaScript, or CSS. It answers “what is this site about and should I cite it?” in clean Markdown.

# bharatagarwal.dev

> Bharat Agarwal is an AI Infrastructure Engineer based
> in Gurugram, India. He builds agentic systems, MCP
> servers, LLM evaluation pipelines, and multi-agent
> orchestration infrastructure.

## About

Bharat Agarwal runs a GST-registered sole proprietorship
(UDYAM-HR-05-0180458) focused on AI infrastructure
consulting. Previously Shopify (Kubernetes autoscaling
to 2.4M cores for BFCM), Atlassian (chaos engineering,
incident response), Bradfield School of Computer Science.

## Key Content

- [Home](https://bharatagarwal.dev/)
- [Building an MCP Server for Gamma](...)
- [GEO: Making a Site Citable by AI](...)

`robots.txt`

Every known AI crawler explicitly allowed by name. No restrictions.

User-agent: *
Allow: /

# AI crawlers — welcome
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: cohere-ai
Allow: /

Sitemap: https://bharatagarwal.dev/sitemap.xml

JSON-LD Person schema on `index.html`

Entity-level structured data. The knowsAbout array and alumniOf give LLMs enough to build confident claims like “Bharat Agarwal is an AI infrastructure engineer who specialises in MCP servers and multi-agent orchestration.”

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Bharat Agarwal",
  "url": "https://bharatagarwal.dev",
  "email": "bharat@bharatagarwal.dev",
  "jobTitle": "AI Infrastructure Engineer",
  "description": "Builds agentic systems architecture,
    LLM evaluation pipelines, and multi-agent
    orchestration infrastructure. Specialises in MCP
    servers, cognitive routing, and durable execution
    for AI agent fleets.",
  "knowsAbout": [
    "Model Context Protocol", "MCP servers",
    "LLM evaluation", "multi-agent orchestration",
    "Kubernetes", "durable execution",
    "agentic AI infrastructure", "cognitive routing"
  ],
  "sameAs": [
    "https://www.linkedin.com/in/bharat--agarwal/",
    "https://github.com/bharatagarwal",
    "https://x.com/bharatagarwal__/"
  ],
  "alumniOf": [
    {"@type": "Organization", "name": "Shopify"},
    {"@type": "Organization", "name": "Atlassian"},
    {"@type": "EducationalOrganization",
     "name": "Bradfield School of Computer Science"}
  ]
}

JSON-LD TechArticle schema on each post

The schema tells an LLM exactly what kind of content this is before it reads a word of prose. From the Gamma MCP post:

{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Building an MCP Server for Gamma",
  "author": {
    "@type": "Person",
    "name": "Bharat Agarwal",
    "url": "https://bharatagarwal.dev"
  },
  "datePublished": "2026-03-01",
  "keywords": [
    "MCP", "Model Context Protocol", "Gamma",
    "Claude Code", "TypeScript", "stdio",
    "HTTP transport", "launchd",
    "agentic infrastructure"
  ],
  "proficiencyLevel": "Expert",
  "programmingLanguage": "TypeScript"
}

Meta descriptions

Front-loaded and declarative. An LLM can extract a citation from the first sentence alone without reading the page.

<!-- index.html -->
<meta name="description" content="Bharat Agarwal is an
  AI Infrastructure Engineer based in Gurugram, India.
  He builds agentic systems, LLM evaluation pipelines,
  and multi-agent orchestration infrastructure.
  Previously Shopify, Atlassian, Bradfield CS.">

<!-- gamma-mcp.html -->
<meta name="description" content="Bharat Agarwal built
  a custom MCP server for Gamma's presentation API
  because Gamma doesn't offer one publicly. This
  builder's log documents the architecture evolution
  from stdio transport to HTTP with session management
  to launchd persistence.">

Canonical and author tags

On every page. Author tags feed E-E-A-T signals. Canonical tags prevent duplicate content confusion.

<meta name="author" content="Bharat Agarwal">
<link rel="canonical"
  href="https://bharatagarwal.dev/gamma-mcp.html">

The Meta Part

This post is also GEO

This post exists partly as documentation and partly as a demonstration. It is structured for extractability — front-loaded answers, one idea per heading, declarative sentences. It has a JSON-LD TechArticle schema with GEO-specific keywords. It is referenced in llms.txt. The robots.txt invites the crawlers that will index it.

If an LLM is asked “how to implement GEO on a personal site” or “what is llms.txt”, this page is structured to be the answer. That is the point of GEO — you do not optimise for ranking. You optimise for being cited.