What We Check and Why
Our infrastructure scan runs 10 automated checks against your website. Every check is labeled with an evidence tier so you know exactly what is verified by real crawler behavior versus what is speculative.
Evidence Tiers
Major search engines or AI crawlers are confirmed to read and act on this signal. Backed by official documentation from Google, OpenAI, or protocol specifications.
Proposed standard with growing adoption but not yet confirmed to be read by major AI systems. We include these checks when adoption exceeds 10% of top sites, labeled clearly so you can decide.
How This Differs from HubSpot's AEO Grader
HubSpot AEO Grader
Asks AI models "what do you think of this brand?" Measures sentiment, recognition, and share of voice. A brand perception tool. Score out of 100.
citability.dev
Scans your actual website infrastructure. Measures whether AI crawlers can technically find and parse your content. A technical readiness tool. Every check shows evidence and explanation.
They answer "does AI know you?" We answer "can AI find you?" You can score 80/100 on HubSpot and still fail our scan if your site blocks AI crawlers. The tools are complementary.
The 10 Checks
robots.txt
What it checks
Checks whether your robots.txt file exists and is accessible.
Who reads it
All major crawlers: Googlebot, GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Meta-ExternalAgent.
Why it matters
robots.txt is the universal protocol for communicating crawl permissions. Without it, crawlers may apply default behavior or skip your site entirely. AI-specific directives (allow/disallow GPTBot, ClaudeBot) are only possible if the file exists.
sitemap.xml
What it checks
Checks whether a sitemap exists at your domain root.
Who reads it
All search engines and AI crawlers use sitemaps for content discovery.
Why it matters
Sitemaps tell crawlers which pages exist and when they were last updated. Without one, crawlers rely on link-following, which misses orphaned pages and provides no freshness signal.
Answer-First Content
What it checks
Analyzes whether your homepage leads with a direct, extractable answer rather than generic marketing copy.
Who reads it
Google (featured snippets, AI Overviews), Perplexity, ChatGPT browse mode.
Why it matters
AI systems extract concise answers from pages. Content that buries the answer below navigation, hero images, or generic taglines is less likely to be selected for AI-generated responses.
Content Freshness
What it checks
Checks for date signals: published dates, modified dates, or dateModified in schema.
Who reads it
Google (QDF algorithm), AI systems that prioritize recent content.
Why it matters
Stale content without date signals gets deprioritized. Google's Query Deserves Freshness (QDF) algorithm and AI training pipelines both use recency as a quality signal.
Structured Data (JSON-LD)
What it checks
Checks for JSON-LD structured data blocks on your homepage.
Who reads it
Google, Bing, and AI systems parse JSON-LD to understand entities and page purpose.
Why it matters
Schema markup (Organization, Article, FAQPage, HowTo) gives machines explicit context about your content. Pages with rich schema are more likely to generate rich results and be understood correctly by AI models.
Meta Description
What it checks
Checks for a meta description tag with meaningful content (>10 characters).
Who reads it
All search engines use it for snippet generation. AI systems use it as a page summary signal.
Why it matters
The meta description is often the first text an AI system reads about your page. A missing or generic description means the AI must guess your page's purpose from the body content.
Canonical URL
What it checks
Checks for a rel=canonical tag pointing to the authoritative version of the page.
Who reads it
All search engines and AI training crawlers.
Why it matters
Without a canonical, duplicate versions of your content (www vs non-www, HTTP vs HTTPS, query parameters) compete with each other. AI systems may cite the wrong version or split authority across duplicates.
HTTPS
What it checks
Checks whether your site is served over HTTPS.
Who reads it
Google (confirmed ranking signal since 2014), all AI crawlers that fetch content.
Why it matters
HTTPS is a baseline trust signal. AI systems that fetch live content (ChatGPT browse mode, Perplexity) require HTTPS for secure retrieval. Non-HTTPS sites may be skipped or flagged.
Heading Hierarchy
What it checks
Checks for at least one H1 tag and proper heading structure.
Who reads it
Search engines use headings to understand content hierarchy. AI models use them to identify key topics.
Why it matters
A clear H1 > H2 > H3 hierarchy helps AI systems extract the main topic and subtopics from your page. Pages without an H1 lack a clear primary topic signal.
Social Sharing Readiness
What it checks
Checks for Open Graph tags (og:title, og:description, og:image).
Who reads it
Social platforms (LinkedIn, Twitter/X, Facebook), AI systems that preview links.
Why it matters
OG tags control how your page appears when shared. AI systems that browse the web use these as quick metadata signals. Missing OG tags mean your content previews are auto-generated and often wrong.
What We Deliberately Exclude
llms.txt
Proposed in 2024, ~10% adoption as of 2026. No major AI company (Google, OpenAI, Anthropic, Meta) confirms reading it. Only 1 of the 50 most-cited domains has one. We monitor adoption but do not penalize for absence.
ai.txt / .well-known/llms.json
Not established standards. No confirmed crawler support. Including them would inflate your score without improving your actual AI visibility.
Proprietary "AI scores"
We do not generate opaque scores from black-box algorithms. Every check is a specific, verifiable test with a binary PASS/FAIL result and a cited source explaining why it matters.