Skip to main content
I Audited 6 Major Websites for AI Citability. Here Is What Actually Predicts Citations.

I Audited 6 Major Websites for AI Citability. Here Is What Actually Predicts Citations.

I Audited 6 Major Websites for AI Citability. Here Is What Actually Predicts Citations.
Chudi Nnorukam Apr 6, 2026 8 min read

Original audit data from 6 websites reveals that domain authority does not predict AI citations. Answer-first content, structured data, and freshness matter more.

Why this matters

I ran AI visibility audits on 6 websites including Ahrefs (DA 92), Reddit, Medium, and my own sites. Domain authority had zero correlation with AI citation rates. The three strongest predictors were answer-first content structure, content freshness with dateModified schema, and structured data coverage.

Domain authority does not predict whether AI will cite your website. I audited 6 major websites for AI citability, and the results challenge nearly everything the SEO industry assumes about AI search visibility.

The site with the highest domain authority (DA 92) was cited by AI only 5% of the time. Sites with millions of daily visitors failed basic infrastructure checks. And the factors that actually predicted citations had nothing to do with backlinks or traffic.

Here is what the data showed.

TL;DR

AI citability is whether AI answer engines include your URL as a source, not just mention your brand.

  • Domain authority has zero correlation with AI citation rates
  • Ahrefs (DA 92) is 100% AI-visible but only 5% cited
  • Reddit, Medium, and X all failed basic AI infrastructure checks
  • The three strongest predictors: answer-first content, dateModified schema, structured data coverage
  • Only 12% of URLs cited by LLMs appear in Google’s top 10 results

The Audit: 6 Sites, 3 AI Platforms, 10 Infrastructure Checks

I used the AI Visibility Readiness (AVR) framework to run infrastructure audits on 6 websites. Each site was checked for 10 signals that AI crawlers use to discover and parse content: robots.txt, sitemap.xml, answer-first content, content freshness, structured data (JSON-LD), meta descriptions, canonical URLs, HTTPS, heading hierarchy, and social sharing readiness.

Then I queried ChatGPT, Perplexity, and Claude with questions each site should be able to answer. I tracked two metrics:

  • AI Visibility: Does the AI mention the brand when asked?
  • AI Citability: Does the AI include a URL from the site as a cited source?

The Results

SiteDomain AuthorityAI InfrastructureAI VisibilityAI Citability
ahrefs.com92Foundation-ready100%5%
semrush.com91Foundation-readyPartialPartial
chudi.dev28Foundation-strong29%0%
reddit.com97Not readyUntestedUntested
medium.com95Not readyUntestedUntested
x.com96Not readyUntestedUntested

The three highest-DA sites (Reddit 97, X 96, Medium 95) all failed basic infrastructure readiness. They are missing structured data, answer-first content, or proper AI crawler permissions. These sites get cited constantly by AI, but not because of their infrastructure. They get cited because AI training data includes their content at massive scale.

For everyone else, infrastructure is the gate.

Does High Domain Authority Mean AI Will Cite You?

No. The data is clear: DA has zero predictive power for AI citations.

Ahrefs has a DA of 92, one of the highest in the SEO industry. Every AI platform recognizes the brand instantly. Ask ChatGPT “what is Ahrefs?” and you get a detailed, accurate answer. That is 100% AI visibility.

But ask ChatGPT “what tools should I use for keyword research?” and Ahrefs gets mentioned but rarely linked. The AI knows the brand exists. It does not need to cite the source. That is the visibility-citation gap, and it exists because AI systems already have the information internalized from training data.

Citation happens when AI needs your content as a source for a specific claim. That requires your content to be structured in a way the AI can extract and attribute.

What Infrastructure Do AI Crawlers Actually Need?

The 10-check audit revealed a clear pattern. Sites that passed 8+ infrastructure checks had measurably higher visibility scores. Sites that failed basic checks were invisible regardless of their authority.

The Baseline Signals

robots.txt and sitemap.xml are table stakes. Every site in the audit had these, but the content of each matters. Reddit’s robots.txt blocks several AI crawlers. Medium’s sitemap is auto-generated but does not include all content pages. Simply having the files is not enough.

HTTPS and canonical URLs are similarly baseline. Every audited site passed these. They are necessary but not differentiating.

The Differentiating Signals

Three signals separated the visible sites from the invisible ones:

Answer-first content. Pages that led with a direct answer in the first 100 words scored dramatically higher on AI extractability. This matches research showing AI systems extract the first clear, unqualified statement they find on a page. Generic marketing copy, hero images, and navigation-heavy layouts all push the answer down, making it harder for AI to extract.

Structured data (JSON-LD). Sites with Article, FAQPage, and HowTo schema gave AI systems explicit context about content purpose and structure. The chudi.dev audit showed 9 schema types across pages, including TechArticle with dateModified, FAQPage with 5+ questions per article, and Person schema with expertise signals. This machine-readable layer is what lets AI systems understand your content without parsing ambiguous HTML.

Content freshness. Pages with dateModified in their schema received 1.8x more AI citations than pages without, according to Semrush research. This aligns with another finding: 95% of ChatGPT citations come from recently published or updated content. Stale content without date signals gets deprioritized.

Which Sites Get Cited vs Just Mentioned?

The gap between being mentioned and being cited is the central problem in AI visibility.

Platform-Specific Citation Behavior

Each AI platform has different citation preferences:

  • Perplexity cites approximately 6.6 sources per answer and heavily indexes Reddit (46.7% of its top cited sources)
  • ChatGPT cites only about 2.6 sources per answer and shows strong Wikipedia preference (7.8% of all citations)
  • Google Gemini cites about 6.1 sources per answer with 76% overlap with Google’s traditional top 10

This means the optimization strategy differs by platform. Perplexity rewards breadth of presence across forums and communities. ChatGPT rewards being on established reference sources. Google AI Overviews still correlates heavily with traditional SEO rankings.

The 12% Divergence

Only 12% of URLs cited by LLMs appear in Google’s top 10 search results for the same queries. This is the statistic that should reframe how you think about AI search: ranking on Google and getting cited by AI are largely separate problems.

The exceptions are Google AI Overviews, which show 76% overlap with traditional rankings. But ChatGPT and Perplexity operate on fundamentally different source selection algorithms.

The Three Factors That Actually Predict AI Citations

Based on the audit data and corroborating research, three factors had the strongest predictive power:

1. Answer-First Content Structure

Pages where the direct answer appears in the first 100 words get extracted more often. This means:

  • Lead with the answer, not the question
  • Keep opening paragraphs to 25-40 words
  • Use clear, factual statements without qualifying language
  • Structure H2 headings as questions the reader would ask AI

The qualifying language point is critical. Phrases like “it depends,” “in many cases,” or “it can be argued” signal uncertainty. AI systems prefer definitive statements they can extract as answers.

2. dateModified Schema with Substantive Updates

The 1.8x citation lift from dateModified schema is real, but only when paired with actual content updates. Google penalizes fake freshness signals, meaning you cannot just bump the date without changing anything. The safe approach:

  • Update content quarterly with new data and statistics
  • Add at least 100 words of substantive new content per refresh
  • Reference current-year sources and data points
  • Only update dateModified when the refresh is genuine

3. Inline Statistics and Original Data

Pages with inline statistics get 40%+ more AI citations. This makes sense: AI systems need claims they can attribute, and specific numbers are the easiest claims to attribute to a source.

Original data is even more powerful. If your page contains data that does not exist elsewhere, AI has no choice but to cite you when referencing it. This is why I publish audit results and benchmark data publicly. The comparison table at the top of this article is data that exists nowhere else.

What This Means for Your Site

The path from invisible to cited is not about building more backlinks or increasing your DA. It is about making your content technically extractable by AI systems.

The checklist is short:

  1. Check your infrastructure. Run a free scan to verify the 10 baseline signals.
  2. Restructure your content. Lead with answers. Use question-based headings. Add FAQ and HowTo schema.
  3. Publish original data. Give AI systems something they can only get from you.
  4. Keep content fresh. Update quarterly with substantive changes and current statistics.
  5. Test across platforms. Query ChatGPT, Perplexity, and Claude with questions your site should answer. Track citation rates over time.

The sites that get cited in 2026 will not be the ones with the highest DA. They will be the ones whose content is structured so AI systems can extract, trust, and attribute it.

FAQ

Does high domain authority help with AI citations?

No. In our audit of 6 websites, domain authority showed zero correlation with AI citation rates. Ahrefs (DA 92) was 100% AI-visible but only 5% cited. Sites with lower DA but better content structure can outperform high-DA sites in AI citations.

What is the difference between AI visibility and AI citability?

AI visibility means the AI recognizes your brand when asked directly. AI citability is the higher bar where the AI includes your URL as a source in its response. Even major sites can be fully visible but rarely cited. They require different optimization approaches.

Which AI platforms cite the most sources per answer?

Perplexity cites approximately 6.6 sources per answer, Google Gemini cites about 6.1, and ChatGPT cites only about 2.6. This means Perplexity gives you more opportunities to be cited per query, while ChatGPT is more selective about its sources.

How do I check if AI can cite my website?

Run a free infrastructure scan at citability.dev to check 10 technical readiness signals. Then manually query ChatGPT, Perplexity, and Claude with questions your site should answer. Check whether your URL appears as a cited source in the AI responses.

What content structure gets the most AI citations?

Answer-first structure where the direct answer appears in the first 100 words, followed by supporting evidence. Use question-based H2 headings, keep paragraphs to 25-40 words, include inline statistics, and add FAQ and HowTo structured data markup.

Sources & Further Reading

Sources

Further Reading

What do you think?

I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.

Discuss on LinkedIn